Mercurial > hg > xemacs-beta

diff src/file-coding.c @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. src/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. This is necessary because there is no reasonable way to access the corresponding mswindows-multibyte functionality from Lisp, and we need such functionality if we're going to have a reliable and portable #'query-coding-region implementation. However, this change doesn't yet provide #'query-coding-region for the mswindow-multibyte coding systems, there should be no functional differences between an XEmacs with this change and one without it. * mule-coding.c (struct fixed_width_coding_system): Add a new coding system type, fixed_width, and implement it. It uses the CCL infrastructure but has a much simpler creation API, and its own query_method, formerly in lisp/mule/mule-coding.el. * unicode.c: Move the Unicode query method implementation here from unicode.el. * lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table here. * intl-win32.c (complex_vars_of_intl_win32): Use Fmake_coding_system_internal, not Fmake_coding_system. * general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence here. * file-coding.h (enum coding_system_variant): Add fixed_width_coding_system here. (struct coding_system_methods): Add query_method and query_lstream_method to the coding system methods. Provide flags for the query methods. Declare the default query method; initialise it correctly in INITIALIZE_CODING_SYSTEM_TYPE. * file-coding.c (default_query_method): New function, the default query method for coding systems that do not set it. Moved from coding.el. (make_coding_system_1): Accept new elements in PROPS in #'make-coding-system; aliases, a list of aliases; safe-chars and safe-charsets (these were previously accepted but not saved); and category. (Fmake_coding_system_internal): New function, what used to be #'make-coding-system--on Mule builds, we've now moved some of the functionality of this to Lisp. (Fcoding_system_canonical_name_p): Move this earlier in the file, since it's now called from within make_coding_system_1. (Fquery_coding_region): Move the implementation of this here, from coding.el. (complex_vars_of_file_coding): Call Fmake_coding_system_internal, not Fmake_coding_system; specify safe-charsets properties when we're a mule build. * extents.h (mouse_highlight_priority, Fset_extent_priority, Fset_extent_face, Fmap_extents): Make these available to other C files. lisp/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> Move the #'query-coding-region implementation to C. * coding.el: Consolidate code that depends on the presence or absence of Mule at the end of this file. (default-query-coding-region, query-coding-region): Move these functions to C. (default-query-coding-region-safe-charset-skip-chars-map): Remove this variable, the corresponding C variable is Vdefault_query_coding_region_chartab_cache in file-coding.c. (query-coding-string): Update docstring to reflect actual multiple values, be more careful about not modifying a range table that we're currently mapping over. (encode-coding-char): Make the implementation of this simpler. (featurep 'mule): Autoload #'make-coding-system from mule/make-coding-system.el if we're a mule build; provide an appropriate compiler macro. Do various non-mule compatibility things if we're not a mule build. * update-elc.el (additional-dump-dependencies): Add mule/make-coding-system as a dump time dependency if we're a mule build. * unicode.el (ccl-encode-to-ucs-2): (decode-char): (encode-char): Move these earlier in the file, for the sake of some byte compile warnings. (unicode-query-coding-region): Move this to unicode.c * mule/make-coding-system.el: New file, not dumped. Contains the functionality to rework the arguments necessary for fixed-width coding systems, and contains the implementation of #'make-coding-system, which now calls #'make-coding-system-internal. * mule/vietnamese.el (viscii): * mule/latin.el (iso-8859-2): (windows-1250): (iso-8859-3): (iso-8859-4): (iso-8859-14): (iso-8859-15): (iso-8859-16): (iso-8859-9): (macintosh): (windows-1252): * mule/hebrew.el (iso-8859-8): * mule/greek.el (iso-8859-7): (windows-1253): * mule/cyrillic.el (iso-8859-5): (koi8-r): (koi8-u): (windows-1251): (alternativnyj): (koi8-ru): (koi8-t): (koi8-c): (koi8-o): * mule/arabic.el (iso-8859-6): (windows-1256): Move all these coding systems to being of type fixed-width, not of type CCL. This allows the distinct query-coding-region for them to be in C, something which will eventually allow us to implement query-coding-region for the mswindows-multibyte coding systems. * mule/general-late.el (posix-charset-to-coding-system-hash): Document why we're pre-emptively persuading the byte compiler that the ELC for this file needs to be written using escape-quoted. Call #'set-unicode-query-skip-chars-args, now the Unicode query-coding-region implementation is in C. * mule/thai-xtis.el (tis-620): Don't bother checking whether we're XEmacs or not here. * mule/mule-coding.el: Move the eight bit fixed-width functionality from this file to make-coding-system.el. tests/ChangeLog addition: 2009-09-19 Aidan Kehoe <kehoea@parhasard.net> * automated/mule-tests.el: Check a coding system's type, not an 8-bit-fixed property, for whether that coding system should be treated as a fixed-width coding system. * automated/query-coding-tests.el: Don't test the query coding functionality for mswindows-multibyte coding systems, it's not yet implemented.
author: Aidan Kehoe <kehoea@parhasard.net>
date: Sat, 19 Sep 2009 22:53:13 +0100
parents: e4ed58cb0e5b
children: a9833e8a32ec e0db3c197671
--- a/src/file-coding.c	Sat Sep 19 17:56:23 2009 +0200
+++ b/src/file-coding.c	Sat Sep 19 22:53:13 2009 +0100
@@ -78,6 +78,9 @@
 #include "lstream.h"
 #include "opaque.h"
 #include "file-coding.h"
+#include "extents.h"
+#include "rangetab.h"
+#include "chartab.h"
 
 #ifdef HAVE_ZLIB
 #include "zlib.h"
@@ -89,10 +92,17 @@
 Lisp_Object Vcoding_system_for_write;
 Lisp_Object Vfile_name_coding_system;
 
+Lisp_Object Qaliases, Qcharset_skip_chars_string;
+
 #ifdef DEBUG_XEMACS
 Lisp_Object Vdebug_coding_detection;
 #endif
 
+#ifdef MULE
+extern Lisp_Object Vcharset_ascii, Vcharset_control_1,
+  Vcharset_latin_iso8859_1;
+#endif
+
 typedef struct coding_system_type_entry
 {
   struct coding_system_methods *meths;
@@ -417,6 +427,155 @@
   return decode_coding_system_type (type, ERROR_ME_NOT) != 0;
 }
 
+#ifdef MULE
+static Lisp_Object Vdefault_query_coding_region_chartab_cache;
+
+/* Non-static because it's used in INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA. */
+Lisp_Object
+default_query_method (Lisp_Object codesys, struct buffer *buf,
+                      Charbpos end, int flags)
+{
+  Charbpos pos = BUF_PT (buf), fail_range_start, fail_range_end;
+  Charbpos pos_byte = BYTE_BUF_PT (buf);
+  Lisp_Object safe_charsets = XCODING_SYSTEM_SAFE_CHARSETS (codesys);
+  Lisp_Object safe_chars = XCODING_SYSTEM_SAFE_CHARS (codesys),
+    result = Qnil;
+  enum query_coding_failure_reasons failed_reason,
+    previous_failed_reason = query_coding_succeeded;
+
+  /* safe-charsets of t means the coding system can encode everything. */
+  if (EQ (Qnil, safe_chars))
+    {
+      if (EQ (Qt, safe_charsets))
+        {
+          return Qnil;
+        }
+
+      /* If we've no information on what characters the coding system can
+         encode, give up. */
+      if (EQ (Qnil, safe_charsets) && EQ (Qnil, safe_chars))
+        {
+          return Qunbound;
+        }
+
+      safe_chars = Fgethash (safe_charsets,
+                             Vdefault_query_coding_region_chartab_cache, 
+                             Qnil);
+      if (NILP (safe_chars))
+        {
+          safe_chars = Fmake_char_table (Qgeneric);
+          {
+            EXTERNAL_LIST_LOOP_2 (safe_charset, safe_charsets)
+              Fput_char_table (safe_charset, Qt, safe_chars);
+          }
+
+          Fputhash (safe_charsets, safe_chars,
+                    Vdefault_query_coding_region_chartab_cache);
+        }
+    }
+
+  if (flags & QUERY_METHOD_HIGHLIGHT && 
+      /* If we're being called really early, live without highlights getting
+         cleared properly: */
+      !(UNBOUNDP (XSYMBOL (Qquery_coding_clear_highlights)->function)))
+    {
+      /* It's okay to call Lisp here, the only non-stack object we may have
+         allocated up to this point is safe_chars, and that's
+         reachable from its entry in
+         Vdefault_query_coding_region_chartab_cache */
+      call3 (Qquery_coding_clear_highlights, make_int (pos), make_int (end),
+             wrap_buffer (buf));
+    }
+
+  while (pos < end)
+    {
+      Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
+      if (!EQ (Qnil, get_char_table (ch, safe_chars)))
+        {
+          pos++;
+          INC_BYTEBPOS (buf, pos_byte);
+        }
+      else
+        {
+          fail_range_start = pos;
+          while ((pos < end) &&  
+                 (EQ (Qnil, get_char_table (ch, safe_chars))
+                  && (failed_reason = query_coding_unencodable))
+                 && (previous_failed_reason == query_coding_succeeded
+                     || previous_failed_reason == failed_reason))
+            {
+              pos++;
+              INC_BYTEBPOS (buf, pos_byte);
+              ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
+              previous_failed_reason = failed_reason;
+            }
+
+          if (fail_range_start == pos)
+            {
+              /* The character can actually be encoded; move on. */
+              pos++;
+              INC_BYTEBPOS (buf, pos_byte);
+            }
+          else
+            {
+              assert (previous_failed_reason == query_coding_unencodable);
+
+              if (flags & QUERY_METHOD_ERRORP)
+                {
+                  DECLARE_EISTRING (error_details);
+
+                  eicpy_ascii (error_details, "Cannot encode ");
+                  eicat_lstr (error_details,
+                              make_string_from_buffer (buf, fail_range_start,
+                                                       pos -
+                                                       fail_range_start));
+                  eicat_ascii (error_details, " using coding system");
+
+                  signal_error (Qtext_conversion_error, 
+                                (const CIbyte *)(eidata (error_details)),
+                                XCODING_SYSTEM_NAME (codesys));
+                }
+
+              if (NILP (result))
+                {
+                  result = Fmake_range_table (Qstart_closed_end_open);
+                }
+
+              fail_range_end = pos;
+
+              Fput_range_table (make_int (fail_range_start), 
+                                make_int (fail_range_end),
+                                Qunencodable,
+                                result);
+              previous_failed_reason = query_coding_succeeded;
+
+              if (flags & QUERY_METHOD_HIGHLIGHT) 
+                {
+                  Lisp_Object extent
+                    = Fmake_extent (make_int (fail_range_start),
+                                    make_int (fail_range_end), 
+                                    wrap_buffer (buf));
+                  
+                  Fset_extent_priority
+                    (extent, make_int (2 + mouse_highlight_priority));
+                  Fset_extent_face (extent, Qquery_coding_warning_face);
+                }
+            }
+        }
+    }
+
+  return result;
+}
+#else
+Lisp_Object
+default_query_method (Lisp_Object UNUSED (codesys),
+                      struct buffer * UNUSED (buf),
+                      Charbpos UNUSED (end), int UNUSED (flags))
+{
+  return Qnil;
+}
+#endif /* defined MULE */
+
 DEFUN ("valid-coding-system-type-p", Fvalid_coding_system_type_p, 1, 1, 0, /*
 Given a CODING-SYSTEM-TYPE, return non-nil if it is valid.
 Valid types depend on how XEmacs was compiled but may include
@@ -982,6 +1141,16 @@
     }
 }
 
+DEFUN ("coding-system-canonical-name-p", Fcoding_system_canonical_name_p,
+       1, 1, 0, /*
+Return t if OBJECT names a coding system, and is not a coding system alias.
+*/
+       (object))
+{
+  return CODING_SYSTEMP (Fgethash (object, Vcoding_system_hash_table, Qnil))
+    ? Qt : Qnil;
+}
+
 /* Basic function to create new coding systems.  For `make-coding-system',
    NAME-OR-EXISTING is the NAME argument, PREFIX is null, and TYPE,
    DESCRIPTION, and PROPS are the same.  All created coding systems are put
@@ -1030,7 +1199,7 @@
   enum eol_type eol_wrapper = EOL_AUTODETECT;
   struct coding_system_methods *meths;
   Lisp_Object csobj;
-  Lisp_Object defmnem = Qnil;
+  Lisp_Object defmnem = Qnil, aliases = Qnil;
 
   if (NILP (type))
     type = Qundecided;
@@ -1119,15 +1288,55 @@
 	  CODING_SYSTEM_POST_READ_CONVERSION (cs) = value;
 	else if (EQ (key, Qpre_write_conversion))
 	  CODING_SYSTEM_PRE_WRITE_CONVERSION (cs) = value;
+        else if (EQ (key, Qaliases))
+          {
+            EXTERNAL_LIST_LOOP_2 (alias, value)
+              {
+                CHECK_SYMBOL (alias);
+
+                if (!NILP (Fcoding_system_canonical_name_p (alias)))
+                  {
+                    invalid_change ("Symbol is the canonical name of a "
+                                    "coding system and cannot be redefined",
+                                    alias);
+                  }
+              }
+            aliases = value;
+          }
 	/* FSF compatibility */
 	else if (EQ (key, Qtranslation_table_for_decode))
 	  ;
 	else if (EQ (key, Qtranslation_table_for_encode))
 	  ;
 	else if (EQ (key, Qsafe_chars))
-	  CODING_SYSTEM_SAFE_CHARS (cs) = value;
+          {
+            CHECK_CHAR_TABLE (value);
+            CODING_SYSTEM_SAFE_CHARS (cs) = value;
+          }
 	else if (EQ (key, Qsafe_charsets))
-	  CODING_SYSTEM_SAFE_CHARSETS (cs) = value;
+          {
+            if (!EQ (Qt, value) 
+                /* Would be nice to actually do this check, but there are
+                   some order conflicts with japanese.el and
+                   mule-coding.el  */
+                && 0)
+              {
+#ifdef MULE
+                EXTERNAL_LIST_LOOP_2 (safe_charset, value) 
+                  CHECK_CHARSET (Ffind_charset (safe_charset));
+#endif
+              }
+
+            CODING_SYSTEM_SAFE_CHARSETS (cs) = value;
+          }
+	else if (EQ (key, Qcategory))
+          {
+            Fput (name_or_existing, intern ("coding-system-property"),
+                  Fplist_put (Fget (name_or_existing, 
+                                    intern ("coding-system-property"),
+                                    Qnil), 
+                              Qcategory, value));
+          }
 	else if (EQ (key, Qmime_charset))
 	  ;
 	else if (EQ (key, Qvalid_codes))
@@ -1186,6 +1395,11 @@
 		    csobj));
       }
     XCODING_SYSTEM_EOL_TYPE (csobj) = eol_wrapper;
+
+    {
+      EXTERNAL_LIST_LOOP_2 (alias, aliases)
+        Fdefine_coding_system_alias (alias, csobj);
+    }
   }
 
   return csobj;
@@ -1199,339 +1413,16 @@
   return make_coding_system_1 (existing, prefix, type, description, props);
 }
 
-DEFUN ("make-coding-system", Fmake_coding_system, 2, 4, 0, /*
-Register symbol NAME as a coding system.
-
-TYPE describes the conversion method used and should be one of
-
-nil or `undecided'
-     Automatic conversion.  XEmacs attempts to detect the coding system
-     used in the file.
-`chain'
-     Chain two or more coding systems together to make a combination coding
-     system.
-`no-conversion'
-     No conversion.  Use this for binary files and such.  On output,
-     graphic characters that are not in ASCII or Latin-1 will be
-     replaced by a ?. (For a no-conversion-encoded buffer, these
-     characters will only be present if you explicitly insert them.)
-`convert-eol'
-     Convert CRLF sequences or CR to LF.
-`shift-jis'
-     Shift-JIS (a Japanese encoding commonly used in PC operating systems).
-`unicode'
-     Any Unicode encoding (UCS-4, UTF-8, UTF-16, etc.).
-`mswindows-unicode-to-multibyte'
-     (MS Windows only) Converts from Windows Unicode to Windows Multibyte
-     (any code page encoding) upon encoding, and the other way upon decoding.
-`mswindows-multibyte'
-     Converts to or from Windows Multibyte (any code page encoding).
-     This is resolved into a chain of `mswindows-unicode' and
-     `mswindows-unicode-to-multibyte'.
-`iso2022'
-     Any ISO2022-compliant encoding.  Among other things, this includes
-     JIS (the Japanese encoding commonly used for e-mail), EUC (the
-     standard Unix encoding for Japanese and other languages), and
-     Compound Text (the encoding used in X11).  You can specify more
-     specific information about the conversion with the PROPS argument.
-`big5'
-     Big5 (the encoding commonly used for Mandarin Chinese in Taiwan).
-`ccl'
-     The conversion is performed using a user-written pseudo-code
-     program.  CCL (Code Conversion Language) is the name of this
-     pseudo-code.
-`gzip'
-     GZIP compression format.
-`internal'
-     Write out or read in the raw contents of the memory representing
-     the buffer's text.  This is primarily useful for debugging
-     purposes, and is only enabled when XEmacs has been compiled with
-     DEBUG_XEMACS defined (via the --debug configure option).
-     WARNING: Reading in a file using `internal' conversion can result
-     in an internal inconsistency in the memory representing a
-     buffer's text, which will produce unpredictable results and may
-     cause XEmacs to crash.  Under normal circumstances you should
-     never use `internal' conversion.
-
-DESCRIPTION is a short English phrase describing the coding system,
-suitable for use as a menu item. (See also the `documentation' property
-below.)
-
-PROPS is a property list, describing the specific nature of the
-character set.  Recognized properties are:
-
-`mnemonic'
-     String to be displayed in the modeline when this coding system is
-     active.
-
-`documentation'
-     Detailed documentation on the coding system.
-
-`eol-type'
-     End-of-line conversion to be used.  It should be one of
-
-	nil
-		Automatically detect the end-of-line type (LF, CRLF,
-		or CR).  Also generate subsidiary coding systems named
-		`NAME-unix', `NAME-dos', and `NAME-mac', that are
-		identical to this coding system but have an EOL-TYPE
-		value of `lf', `crlf', and `cr', respectively.
-	`lf'
-		The end of a line is marked externally using ASCII LF.
-		Since this is also the way that XEmacs represents an
-		end-of-line internally, specifying this option results
-		in no end-of-line conversion.  This is the standard
-		format for Unix text files.
-	`crlf'
-		The end of a line is marked externally using ASCII
-		CRLF.  This is the standard format for MS-DOS text
-		files.
-	`cr'
-		The end of a line is marked externally using ASCII CR.
-		This is the standard format for Macintosh text files.
-	t
-		Automatically detect the end-of-line type but do not
-		generate subsidiary coding systems.  (This value is
-		converted to nil when stored internally, and
-		`coding-system-property' will return nil.)
-
-`post-read-conversion'
-     The value is a function to call after some text is inserted and
-     decoded by the coding system itself and before any functions in
-     `after-change-functions' are called. (#### Not actually true in
-     XEmacs. `after-change-functions' will be called twice if
-     `post-read-conversion' changes something.) The argument of this
-     function is the same as for a function in
-     `after-insert-file-functions', i.e. LENGTH of the text inserted,
-     with point at the head of the text to be decoded.
-
-`pre-write-conversion'
-     The value is a function to call after all functions in
-     `write-region-annotate-functions' and `buffer-file-format' are
-     called, and before the text is encoded by the coding system itself.
-     The arguments to this function are the same as those of a function
-     in `write-region-annotate-functions', i.e. FROM and TO, specifying
-     a region of text.
-
-
-
-The following properties are allowed for FSF compatibility but currently
-ignored:
-
-`translation-table-for-decode'
-     The value is a translation table to be applied on decoding.  See
-     the function `make-translation-table' for the format of translation
-     table.  This is not applicable to CCL-based coding systems.
-    
-`translation-table-for-encode'
-     The value is a translation table to be applied on encoding.  This is
-     not applicable to CCL-based coding systems.
-     
-`mime-charset'
-     The value is a symbol of which name is `MIME-charset' parameter of
-     the coding system.
-    
-`valid-codes' (meaningful only for a coding system based on CCL)
-     The value is a list to indicate valid byte ranges of the encoded
-     file.  Each element of the list is an integer or a cons of integer.
-     In the former case, the integer value is a valid byte code.  In the
-     latter case, the integers specifies the range of valid byte codes.
-
-The following properties are used by `default-query-coding-region',
-the default implementation of `query-coding-region'. This
-implementation and these properties are not used by the Unicode coding
-systems, nor by those CCL coding systems created with
-`make-8-bit-coding-system'. 
-
-`safe-chars'
-     The value is a char table.  If a character has non-nil value in it,
-     the character is safely supported by the coding system.  
-     Under XEmacs, for the moment, this is used in addition to the
-     `safe-charsets' property. It does not override it as it does
-     under GNU Emacs. #### We need to consider if we should keep this
-     behaviour.
-   
-`safe-charsets'
-     The value is a list of charsets safely supported by the coding
-     system.  For coding systems based on ISO 2022, XEmacs may try to
-     encode characters outside these character sets, but outside of
-     East Asia and East Asian coding systems, it is unlikely that
-     consumers of the data will understand XEmacs' encoding.
-     The value t means that all XEmacs character sets handles are supported.  
-
-The following additional property is recognized if TYPE is `convert-eol':
-
-`subtype'
-     One of `lf', `crlf', `cr' or nil (for autodetection).  When decoding,
-     the corresponding sequence will be converted to LF.  When encoding,
-     the opposite happens.  This coding system converts characters to
-     characters.
-
-
-
-The following additional properties are recognized if TYPE is `iso2022':
-
-`charset-g0'
-`charset-g1'
-`charset-g2'
-`charset-g3'
-     The character set initially designated to the G0 - G3 registers.
-     The value should be one of
-
-          -- A charset object (designate that character set)
-	  -- nil (do not ever use this register)
-	  -- t (no character set is initially designated to
-		the register, but may be later on; this automatically
-		sets the corresponding `force-g*-on-output' property)
-
-`force-g0-on-output'
-`force-g1-on-output'
-`force-g2-on-output'
-`force-g2-on-output'
-     If non-nil, send an explicit designation sequence on output before
-     using the specified register.
-
-`short'
-     If non-nil, use the short forms "ESC $ @", "ESC $ A", and
-     "ESC $ B" on output in place of the full designation sequences
-     "ESC $ ( @", "ESC $ ( A", and "ESC $ ( B".
-
-`no-ascii-eol'
-     If non-nil, don't designate ASCII to G0 at each end of line on output.
-     Setting this to non-nil also suppresses other state-resetting that
-     normally happens at the end of a line.
-
-`no-ascii-cntl'
-     If non-nil, don't designate ASCII to G0 before control chars on output.
-
-`seven'
-     If non-nil, use 7-bit environment on output.  Otherwise, use 8-bit
-     environment.
-
-`lock-shift'
-     If non-nil, use locking-shift (SO/SI) instead of single-shift
-     or designation by escape sequence.
-
-`no-iso6429'
-     If non-nil, don't use ISO6429's direction specification.
-
-`escape-quoted'
-     If non-nil, literal control characters that are the same as
-     the beginning of a recognized ISO2022 or ISO6429 escape sequence
-     (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E),
-     SS3 (0x8F), and CSI (0x9B)) are "quoted" with an escape character
-     so that they can be properly distinguished from an escape sequence.
-     (Note that doing this results in a non-portable encoding.) This
-     encoding flag is used for byte-compiled files.  Note that ESC
-     is a good choice for a quoting character because there are no
-     escape sequences whose second byte is a character from the Control-0
-     or Control-1 character sets; this is explicitly disallowed by the
-     ISO2022 standard.
-
-`input-charset-conversion'
-     A list of conversion specifications, specifying conversion of
-     characters in one charset to another when decoding is performed.
-     Each specification is a list of two elements: the source charset,
-     and the destination charset.
-
-`output-charset-conversion'
-     A list of conversion specifications, specifying conversion of
-     characters in one charset to another when encoding is performed.
-     The form of each specification is the same as for
-     `input-charset-conversion'.
-
-
-
-The following additional properties are recognized (and required)
-if TYPE is `ccl':
-
-`decode'
-     CCL program used for decoding (converting to internal format).
-
-`encode'
-     CCL program used for encoding (converting to external format).
-
-
-The following additional properties are recognized if TYPE is `chain':
-
-`chain'
-     List of coding systems to be chained together, in decoding order.
-
-`canonicalize-after-coding'
-     Coding system to be returned by the detector routines in place of
-     this coding system.
-
-
-
-The following additional properties are recognized if TYPE is `unicode':
-
-`unicode-type'
-     One of `utf-16', `utf-8', `ucs-4', or `utf-7' (the latter is not
-     yet implemented).  `utf-16' is the basic two-byte encoding;
-     `ucs-4' is the four-byte encoding; `utf-8' is an ASCII-compatible
-     variable-width 8-bit encoding; `utf-7' is a 7-bit encoding using
-     only characters that will safely pass through all mail gateways.
-     [[ This should be \"transformation format\".  There should also be
-     `ucs-2' (or `bmp' -- no surrogates) and `utf-32' (range checked). ]]
-
-`little-endian'
-     If non-nil, `utf-16' and `ucs-4' will write out the groups of two
-     or four bytes little-endian instead of big-endian.  This is required,
-     for example, under Windows.
-
-`need-bom'
-     If non-nil, a byte order mark (BOM, or Unicode FFFE) should be
-     written out at the beginning of the data.  This serves both to
-     identify the endianness of the following data and to mark the
-     data as Unicode (at least, this is how Windows uses it).
-     [[ The correct term is \"signature\", since this technique may also
-     be used with UTF-8.  That is the term used in the standard. ]]
-
-
-The following additional properties are recognized if TYPE is
-`mswindows-multibyte':
-
-`code-page'
-     Either a number (specifying a particular code page) or one of the
-     symbols `ansi', `oem', `mac', or `ebcdic', specifying the ANSI,
-     OEM, Macintosh, or EBCDIC code page associated with a particular
-     locale (given by the `locale' property).  NOTE: EBCDIC code pages
-     only exist in Windows 2000 and later.
-
-`locale'
-     If `code-page' is a symbol, this specifies the locale whose code
-     page of the corresponding type should be used.  This should be
-     one of the following: A cons of two strings, (LANGUAGE
-     . SUBLANGUAGE) (see `mswindows-set-current-locale'); a string (a
-     language; SUBLANG_DEFAULT, i.e. the default sublanguage, is
-     used); or one of the symbols `current', `user-default', or
-     `system-default', corresponding to the values of
-     `mswindows-current-locale', `mswindows-user-default-locale', or
-     `mswindows-system-default-locale', respectively.
-
-
-
-The following additional properties are recognized if TYPE is `undecided':
-\[[ Doesn't GNU use \"detect-*\" for the following two? ]]
-
-`do-eol'
-     Do EOL detection.
-
-`do-coding'
-     Do encoding detection.
-
-`coding-system'
-     If encoding detection is not done, use the specified coding system
-     to do decoding.  This is used internally when implementing coding
-     systems with an EOL type that specifies autodetection (the default),
-     so that the detector routines return the proper subsidiary.
-
-
-
-The following additional property is recognized if TYPE is `gzip':
-
-`level'
-     Compression level: 0 through 9, or `default' (currently 6).
+DEFUN ("make-coding-system-internal", Fmake_coding_system_internal, 2, 4, 0, /*
+See `make-coding-system'.  This does much of the work of that function.
+
+Without Mule support, it does all the work of that function, and an alias
+exists, mapping `make-coding-system' to
+`make-coding-system-internal'. You'll need a non-Mule XEmacs to read the
+complete docstring. Or you can just read it in make-coding-system.el;
+something like the following should work:
+
+ \\[find-function-other-window] find-file RET \\[find-file] mule/make-coding-system.el RET
 
 */
        (name, type, description, props))
@@ -1577,16 +1468,6 @@
   return new_coding_system;
 }
 
-DEFUN ("coding-system-canonical-name-p", Fcoding_system_canonical_name_p,
-       1, 1, 0, /*
-Return t if OBJECT names a coding system, and is not a coding system alias.
-*/
-       (object))
-{
-  return CODING_SYSTEMP (Fgethash (object, Vcoding_system_hash_table, Qnil))
-    ? Qt : Qnil;
-}
-
 /* #### Shouldn't this really be a find/get pair? */
 
 DEFUN ("coding-system-alias-p", Fcoding_system_alias_p, 1, 1, 0, /*
@@ -2475,6 +2356,100 @@
 				      CODING_ENCODE);
 }
 
+DEFUN ("query-coding-region", Fquery_coding_region, 3, 7, 0, /*
+Work out whether CODING-SYSTEM can losslessly encode a region.
+
+START and END are the beginning and end of the region to check.
+CODING-SYSTEM is the coding system to try.
+
+Optional argument BUFFER is the buffer to check, and defaults to the current
+buffer.
+
+IGNORE-INVALID-SEQUENCESP, also an optional argument, says to treat XEmacs
+characters which have an unambiguous encoded representation, despite being
+undefined in what they represent, as encodable.  These chiefly arise with
+variable-length encodings like UTF-8 and UTF-16, where an invalid sequence
+is passed through to XEmacs as a sequence of characters with a defined
+correspondence to the octets on disk, but no non-error semantics; see the
+`invalid-sequence-coding-system' argument to `set-language-info'.
+
+They can also arise with fixed-length encodings like ISO 8859-7, where
+certain octets on disk have undefined values, and treating them as
+corresponding to the ISO 8859-1 characters with the same numerical values
+may lead to data that is not understood by other applications.
+
+Optional argument ERRORP says to signal a `text-conversion-error' if some
+character in the region cannot be encoded, and defaults to nil.
+
+Optional argument HIGHLIGHT says to display unencodable characters in the
+region using `query-coding-warning-face'. It defaults to nil.
+
+This function can return multiple values; the intention is that callers use
+`multiple-value-bind' or the related CL multiple value functions to deal
+with it.  The first result is `t' if the region can be encoded using
+CODING-SYSTEM, or `nil' if not.  If the region cannot be encoded using
+CODING-SYSTEM, the second result is a range table describing the positions
+of the unencodable characters.
+
+Ranges that describe characters that would be ignored were
+IGNORE-INVALID-SEQUENCESP non-nil map to the symbol `invalid-sequence';
+other ranges map to the symbol `unencodable'.  If IGNORE-INVALID-SEQUENCESP
+is non-nil, all ranges will map to the symbol `unencodable'.  See
+`make-range-table' for more details of range tables.
+*/
+       (start, end, coding_system, buffer, ignore_invalid_sequencesp,
+        errorp, highlight))
+{
+  Charbpos b, e;
+  struct buffer *buf = decode_buffer (buffer, 1);
+  Lisp_Object result;
+  int flags = 0, speccount = specpdl_depth (); 
+
+  coding_system = Fget_coding_system (coding_system);
+
+  get_buffer_range_char (buf, start, end, &b, &e, 0);
+
+  if (buf != current_buffer)
+    {
+      record_unwind_protect (save_current_buffer_restore, Fcurrent_buffer ());
+      set_buffer_internal (buf);
+    }
+
+  record_unwind_protect (save_excursion_restore, save_excursion_save ());
+
+  BUF_SET_PT (buf, b);
+
+  if (!NILP (ignore_invalid_sequencesp))
+    {
+      flags |= QUERY_METHOD_IGNORE_INVALID_SEQUENCES;
+    }
+
+  if (!NILP (errorp))
+    {
+      flags |= QUERY_METHOD_ERRORP;
+    }
+
+  if (!NILP (highlight))
+    {
+      flags |= QUERY_METHOD_HIGHLIGHT;
+    }
+
+  result = XCODESYSMETH_OR_GIVEN (coding_system, query,
+                                  (coding_system, buf, e, flags), Qunbound);
+
+  if (UNBOUNDP (result))
+    {
+      signal_error (Qtext_conversion_error,
+                    "Coding system doesn't say what it can encode", 
+                    XCODING_SYSTEM_NAME (coding_system));
+    }
+
+  result = (NILP (result)) ? Qt : values2 (Qnil, result); 
+
+  return unbind_to_1 (speccount, result);
+}
+
+
 
 /************************************************************************/
 /*                             Chain methods                            */
@@ -4550,7 +4525,7 @@
   DEFSUBR (Fget_coding_system);
   DEFSUBR (Fcoding_system_list);
   DEFSUBR (Fcoding_system_name);
-  DEFSUBR (Fmake_coding_system);
+  DEFSUBR (Fmake_coding_system_internal);
   DEFSUBR (Fcopy_coding_system);
   DEFSUBR (Fcoding_system_canonical_name_p);
   DEFSUBR (Fcoding_system_alias_p);
@@ -4573,6 +4548,7 @@
   DEFSUBR (Fdetect_coding_region);
   DEFSUBR (Fdecode_coding_region);
   DEFSUBR (Fencode_coding_region);
+  DEFSUBR (Fquery_coding_region);
   DEFSYMBOL_MULTIWORD_PREDICATE (Qcoding_systemp);
   DEFSYMBOL (Qno_conversion);
   DEFSYMBOL (Qconvert_eol);
@@ -4621,6 +4597,10 @@
 
   DEFSYMBOL (Qescape_quoted);
 
+  DEFSYMBOL (Qquery_coding_warning_face);
+  DEFSYMBOL (Qaliases);
+  DEFSYMBOL (Qcharset_skip_chars_string);
+
 #ifdef HAVE_ZLIB
   DEFSYMBOL (Qgzip);
 #endif
@@ -4844,6 +4824,12 @@
 */ );
   Vdebug_coding_detection = Qnil;
 #endif
+
+#ifdef MULE
+  Vdefault_query_coding_region_chartab_cache
+    = make_lisp_hash_table (25, HASH_TABLE_NON_WEAK, HASH_TABLE_EQUAL);
+  staticpro (&Vdefault_query_coding_region_chartab_cache);
+#endif
 }
 
 /* #### reformat this for consistent appearance? */
@@ -4851,7 +4837,7 @@
 void
 complex_vars_of_file_coding (void)
 {
-  Fmake_coding_system
+  Fmake_coding_system_internal
     (Qconvert_eol_cr, Qconvert_eol,
      build_msg_string ("Convert CR to LF"),
      nconc2 (list6 (Qdocumentation,
@@ -4863,9 +4849,10 @@
 	     /* VERY IMPORTANT!  Tell make-coding-system not to generate
 		subsidiaries -- it needs the coding systems we're creating
 		to do so! */
-	     list2 (Qeol_type, Qlf)));
-
-  Fmake_coding_system
+	     list4 (Qeol_type, Qlf,
+                    Qsafe_charsets, Qt)));
+
+  Fmake_coding_system_internal
     (Qconvert_eol_lf, Qconvert_eol,
      build_msg_string ("Convert LF to LF (do nothing)"),
      nconc2 (list6 (Qdocumentation,
@@ -4876,9 +4863,10 @@
 	     /* VERY IMPORTANT!  Tell make-coding-system not to generate
 		subsidiaries -- it needs the coding systems we're creating
 		to do so! */
-	     list2 (Qeol_type, Qlf)));
-
-  Fmake_coding_system
+	     list4 (Qeol_type, Qlf,
+                    Qsafe_charsets, Qt)));
+
+  Fmake_coding_system_internal
     (Qconvert_eol_crlf, Qconvert_eol,
      build_msg_string ("Convert CRLF to LF"),
      nconc2 (list6 (Qdocumentation,
@@ -4887,12 +4875,14 @@
 "(used internally and under Unix to mark the end of a line)."),
 		    Qmnemonic, build_string ("CRLF->LF"),
 		    Qsubtype, Qcrlf),
+
 	     /* VERY IMPORTANT!  Tell make-coding-system not to generate
 		subsidiaries -- it needs the coding systems we're creating
 		to do so! */
-	     list2 (Qeol_type, Qlf)));
-
-  Fmake_coding_system
+	     list4 (Qeol_type, Qlf,
+                    Qsafe_charsets, Qt)));
+
+  Fmake_coding_system_internal
     (Qconvert_eol_autodetect, Qconvert_eol,
      build_msg_string ("Autodetect EOL type"),
      nconc2 (list6 (Qdocumentation,
@@ -4903,9 +4893,10 @@
 	     /* VERY IMPORTANT!  Tell make-coding-system not to generate
 		subsidiaries -- it needs the coding systems we're creating
 		to do so! */
-	     list2 (Qeol_type, Qlf)));
-
-  Fmake_coding_system
+	     list4 (Qeol_type, Qlf,
+                    Qsafe_charsets, Qt)));
+
+  Fmake_coding_system_internal
     (Qundecided, Qundecided,
      build_msg_string ("Undecided (auto-detect)"),
      nconc2 (list4 (Qdocumentation,
@@ -4918,7 +4909,7 @@
 		       though, I don't think.) */
 		    Qeol_type, Qlf)));
 
-  Fmake_coding_system
+  Fmake_coding_system_internal
     (intern ("undecided-dos"), Qundecided,
      build_msg_string ("Undecided (auto-detect) (CRLF)"),
      nconc2 (list4 (Qdocumentation,
@@ -4928,7 +4919,7 @@
 	     list4 (Qdo_coding, Qt,
 		    Qeol_type, Qcrlf)));
 
-  Fmake_coding_system
+  Fmake_coding_system_internal
     (intern ("undecided-unix"), Qundecided,
      build_msg_string ("Undecided (auto-detect) (LF)"),
      nconc2 (list4 (Qdocumentation,
@@ -4938,7 +4929,7 @@
 	     list4 (Qdo_coding, Qt,
 		    Qeol_type, Qlf)));
 
-  Fmake_coding_system
+  Fmake_coding_system_internal
     (intern ("undecided-mac"), Qundecided,
      build_msg_string ("Undecided (auto-detect) (CR)"),
      nconc2 (list4 (Qdocumentation,
@@ -4949,26 +4940,42 @@
 		    Qeol_type, Qcr)));
 
   /* Need to create this here or we're really screwed. */
-  Fmake_coding_system
+  Fmake_coding_system_internal
     (Qraw_text, Qno_conversion,
      build_msg_string ("Raw Text"),
-     list4 (Qdocumentation,
-	    build_msg_string ("Raw text converts only line-break codes, and acts otherwise like `binary'."),
-	    Qmnemonic, build_string ("Raw")));
-
-  Fmake_coding_system
+     nconc2 (list4 (Qdocumentation,
+                    build_msg_string ("Raw text converts only line-break "
+                                      "codes, and acts otherwise like "
+                                      "`binary'."),
+                    Qmnemonic, build_string ("Raw")),
+#ifdef MULE
+             list2 (Qsafe_charsets, list3 (Vcharset_ascii, Vcharset_control_1,
+                                           Vcharset_latin_iso8859_1))));
+
+#else
+             Qnil));
+#endif
+
+  Fmake_coding_system_internal
     (Qbinary, Qno_conversion,
      build_msg_string ("Binary"),
-     list6 (Qdocumentation,
-	    build_msg_string (
+     nconc2 (list6 (Qdocumentation,
+                    build_msg_string (
 "This coding system is as close as it comes to doing no conversion.\n"
 "On input, each byte is converted directly into the character\n"
 "with the corresponding code -- i.e. from the `ascii', `control-1',\n"
 "or `latin-1' character sets.  On output, these characters are\n"
 "converted back to the corresponding bytes, and other characters\n"
 "are converted to the default character, i.e. `~'."),
-	    Qeol_type, Qlf,
-	    Qmnemonic, build_string ("Binary")));
+                    Qeol_type, Qlf,
+                    Qmnemonic, build_string ("Binary")),
+#ifdef MULE
+             list2 (Qsafe_charsets, list3 (Vcharset_ascii, Vcharset_control_1,
+                                           Vcharset_latin_iso8859_1))));
+
+#else
+             Qnil));
+#endif
 
   /* Formerly aliased to raw-text!  Completely bogus and not even the same
      as FSF Emacs. */
author	Aidan Kehoe <kehoea@parhasard.net>
date	Sat, 19 Sep 2009 22:53:13 +0100
parents	e4ed58cb0e5b
children	a9833e8a32ec e0db3c197671