Mercurial > hg > xemacs-beta
diff src/file-coding.c @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | e4ed58cb0e5b |
children | a9833e8a32ec e0db3c197671 |
line wrap: on
line diff
--- a/src/file-coding.c Sat Sep 19 17:56:23 2009 +0200 +++ b/src/file-coding.c Sat Sep 19 22:53:13 2009 +0100 @@ -78,6 +78,9 @@ #include "lstream.h" #include "opaque.h" #include "file-coding.h" +#include "extents.h" +#include "rangetab.h" +#include "chartab.h" #ifdef HAVE_ZLIB #include "zlib.h" @@ -89,10 +92,17 @@ Lisp_Object Vcoding_system_for_write; Lisp_Object Vfile_name_coding_system; +Lisp_Object Qaliases, Qcharset_skip_chars_string; + #ifdef DEBUG_XEMACS Lisp_Object Vdebug_coding_detection; #endif +#ifdef MULE +extern Lisp_Object Vcharset_ascii, Vcharset_control_1, + Vcharset_latin_iso8859_1; +#endif + typedef struct coding_system_type_entry { struct coding_system_methods *meths; @@ -417,6 +427,155 @@ return decode_coding_system_type (type, ERROR_ME_NOT) != 0; } +#ifdef MULE +static Lisp_Object Vdefault_query_coding_region_chartab_cache; + +/* Non-static because it's used in INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA. */ +Lisp_Object +default_query_method (Lisp_Object codesys, struct buffer *buf, + Charbpos end, int flags) +{ + Charbpos pos = BUF_PT (buf), fail_range_start, fail_range_end; + Charbpos pos_byte = BYTE_BUF_PT (buf); + Lisp_Object safe_charsets = XCODING_SYSTEM_SAFE_CHARSETS (codesys); + Lisp_Object safe_chars = XCODING_SYSTEM_SAFE_CHARS (codesys), + result = Qnil; + enum query_coding_failure_reasons failed_reason, + previous_failed_reason = query_coding_succeeded; + + /* safe-charsets of t means the coding system can encode everything. */ + if (EQ (Qnil, safe_chars)) + { + if (EQ (Qt, safe_charsets)) + { + return Qnil; + } + + /* If we've no information on what characters the coding system can + encode, give up. */ + if (EQ (Qnil, safe_charsets) && EQ (Qnil, safe_chars)) + { + return Qunbound; + } + + safe_chars = Fgethash (safe_charsets, + Vdefault_query_coding_region_chartab_cache, + Qnil); + if (NILP (safe_chars)) + { + safe_chars = Fmake_char_table (Qgeneric); + { + EXTERNAL_LIST_LOOP_2 (safe_charset, safe_charsets) + Fput_char_table (safe_charset, Qt, safe_chars); + } + + Fputhash (safe_charsets, safe_chars, + Vdefault_query_coding_region_chartab_cache); + } + } + + if (flags & QUERY_METHOD_HIGHLIGHT && + /* If we're being called really early, live without highlights getting + cleared properly: */ + !(UNBOUNDP (XSYMBOL (Qquery_coding_clear_highlights)->function))) + { + /* It's okay to call Lisp here, the only non-stack object we may have + allocated up to this point is safe_chars, and that's + reachable from its entry in + Vdefault_query_coding_region_chartab_cache */ + call3 (Qquery_coding_clear_highlights, make_int (pos), make_int (end), + wrap_buffer (buf)); + } + + while (pos < end) + { + Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte); + if (!EQ (Qnil, get_char_table (ch, safe_chars))) + { + pos++; + INC_BYTEBPOS (buf, pos_byte); + } + else + { + fail_range_start = pos; + while ((pos < end) && + (EQ (Qnil, get_char_table (ch, safe_chars)) + && (failed_reason = query_coding_unencodable)) + && (previous_failed_reason == query_coding_succeeded + || previous_failed_reason == failed_reason)) + { + pos++; + INC_BYTEBPOS (buf, pos_byte); + ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte); + previous_failed_reason = failed_reason; + } + + if (fail_range_start == pos) + { + /* The character can actually be encoded; move on. */ + pos++; + INC_BYTEBPOS (buf, pos_byte); + } + else + { + assert (previous_failed_reason == query_coding_unencodable); + + if (flags & QUERY_METHOD_ERRORP) + { + DECLARE_EISTRING (error_details); + + eicpy_ascii (error_details, "Cannot encode "); + eicat_lstr (error_details, + make_string_from_buffer (buf, fail_range_start, + pos - + fail_range_start)); + eicat_ascii (error_details, " using coding system"); + + signal_error (Qtext_conversion_error, + (const CIbyte *)(eidata (error_details)), + XCODING_SYSTEM_NAME (codesys)); + } + + if (NILP (result)) + { + result = Fmake_range_table (Qstart_closed_end_open); + } + + fail_range_end = pos; + + Fput_range_table (make_int (fail_range_start), + make_int (fail_range_end), + Qunencodable, + result); + previous_failed_reason = query_coding_succeeded; + + if (flags & QUERY_METHOD_HIGHLIGHT) + { + Lisp_Object extent + = Fmake_extent (make_int (fail_range_start), + make_int (fail_range_end), + wrap_buffer (buf)); + + Fset_extent_priority + (extent, make_int (2 + mouse_highlight_priority)); + Fset_extent_face (extent, Qquery_coding_warning_face); + } + } + } + } + + return result; +} +#else +Lisp_Object +default_query_method (Lisp_Object UNUSED (codesys), + struct buffer * UNUSED (buf), + Charbpos UNUSED (end), int UNUSED (flags)) +{ + return Qnil; +} +#endif /* defined MULE */ + DEFUN ("valid-coding-system-type-p", Fvalid_coding_system_type_p, 1, 1, 0, /* Given a CODING-SYSTEM-TYPE, return non-nil if it is valid. Valid types depend on how XEmacs was compiled but may include @@ -982,6 +1141,16 @@ } } +DEFUN ("coding-system-canonical-name-p", Fcoding_system_canonical_name_p, + 1, 1, 0, /* +Return t if OBJECT names a coding system, and is not a coding system alias. +*/ + (object)) +{ + return CODING_SYSTEMP (Fgethash (object, Vcoding_system_hash_table, Qnil)) + ? Qt : Qnil; +} + /* Basic function to create new coding systems. For `make-coding-system', NAME-OR-EXISTING is the NAME argument, PREFIX is null, and TYPE, DESCRIPTION, and PROPS are the same. All created coding systems are put @@ -1030,7 +1199,7 @@ enum eol_type eol_wrapper = EOL_AUTODETECT; struct coding_system_methods *meths; Lisp_Object csobj; - Lisp_Object defmnem = Qnil; + Lisp_Object defmnem = Qnil, aliases = Qnil; if (NILP (type)) type = Qundecided; @@ -1119,15 +1288,55 @@ CODING_SYSTEM_POST_READ_CONVERSION (cs) = value; else if (EQ (key, Qpre_write_conversion)) CODING_SYSTEM_PRE_WRITE_CONVERSION (cs) = value; + else if (EQ (key, Qaliases)) + { + EXTERNAL_LIST_LOOP_2 (alias, value) + { + CHECK_SYMBOL (alias); + + if (!NILP (Fcoding_system_canonical_name_p (alias))) + { + invalid_change ("Symbol is the canonical name of a " + "coding system and cannot be redefined", + alias); + } + } + aliases = value; + } /* FSF compatibility */ else if (EQ (key, Qtranslation_table_for_decode)) ; else if (EQ (key, Qtranslation_table_for_encode)) ; else if (EQ (key, Qsafe_chars)) - CODING_SYSTEM_SAFE_CHARS (cs) = value; + { + CHECK_CHAR_TABLE (value); + CODING_SYSTEM_SAFE_CHARS (cs) = value; + } else if (EQ (key, Qsafe_charsets)) - CODING_SYSTEM_SAFE_CHARSETS (cs) = value; + { + if (!EQ (Qt, value) + /* Would be nice to actually do this check, but there are + some order conflicts with japanese.el and + mule-coding.el */ + && 0) + { +#ifdef MULE + EXTERNAL_LIST_LOOP_2 (safe_charset, value) + CHECK_CHARSET (Ffind_charset (safe_charset)); +#endif + } + + CODING_SYSTEM_SAFE_CHARSETS (cs) = value; + } + else if (EQ (key, Qcategory)) + { + Fput (name_or_existing, intern ("coding-system-property"), + Fplist_put (Fget (name_or_existing, + intern ("coding-system-property"), + Qnil), + Qcategory, value)); + } else if (EQ (key, Qmime_charset)) ; else if (EQ (key, Qvalid_codes)) @@ -1186,6 +1395,11 @@ csobj)); } XCODING_SYSTEM_EOL_TYPE (csobj) = eol_wrapper; + + { + EXTERNAL_LIST_LOOP_2 (alias, aliases) + Fdefine_coding_system_alias (alias, csobj); + } } return csobj; @@ -1199,339 +1413,16 @@ return make_coding_system_1 (existing, prefix, type, description, props); } -DEFUN ("make-coding-system", Fmake_coding_system, 2, 4, 0, /* -Register symbol NAME as a coding system. - -TYPE describes the conversion method used and should be one of - -nil or `undecided' - Automatic conversion. XEmacs attempts to detect the coding system - used in the file. -`chain' - Chain two or more coding systems together to make a combination coding - system. -`no-conversion' - No conversion. Use this for binary files and such. On output, - graphic characters that are not in ASCII or Latin-1 will be - replaced by a ?. (For a no-conversion-encoded buffer, these - characters will only be present if you explicitly insert them.) -`convert-eol' - Convert CRLF sequences or CR to LF. -`shift-jis' - Shift-JIS (a Japanese encoding commonly used in PC operating systems). -`unicode' - Any Unicode encoding (UCS-4, UTF-8, UTF-16, etc.). -`mswindows-unicode-to-multibyte' - (MS Windows only) Converts from Windows Unicode to Windows Multibyte - (any code page encoding) upon encoding, and the other way upon decoding. -`mswindows-multibyte' - Converts to or from Windows Multibyte (any code page encoding). - This is resolved into a chain of `mswindows-unicode' and - `mswindows-unicode-to-multibyte'. -`iso2022' - Any ISO2022-compliant encoding. Among other things, this includes - JIS (the Japanese encoding commonly used for e-mail), EUC (the - standard Unix encoding for Japanese and other languages), and - Compound Text (the encoding used in X11). You can specify more - specific information about the conversion with the PROPS argument. -`big5' - Big5 (the encoding commonly used for Mandarin Chinese in Taiwan). -`ccl' - The conversion is performed using a user-written pseudo-code - program. CCL (Code Conversion Language) is the name of this - pseudo-code. -`gzip' - GZIP compression format. -`internal' - Write out or read in the raw contents of the memory representing - the buffer's text. This is primarily useful for debugging - purposes, and is only enabled when XEmacs has been compiled with - DEBUG_XEMACS defined (via the --debug configure option). - WARNING: Reading in a file using `internal' conversion can result - in an internal inconsistency in the memory representing a - buffer's text, which will produce unpredictable results and may - cause XEmacs to crash. Under normal circumstances you should - never use `internal' conversion. - -DESCRIPTION is a short English phrase describing the coding system, -suitable for use as a menu item. (See also the `documentation' property -below.) - -PROPS is a property list, describing the specific nature of the -character set. Recognized properties are: - -`mnemonic' - String to be displayed in the modeline when this coding system is - active. - -`documentation' - Detailed documentation on the coding system. - -`eol-type' - End-of-line conversion to be used. It should be one of - - nil - Automatically detect the end-of-line type (LF, CRLF, - or CR). Also generate subsidiary coding systems named - `NAME-unix', `NAME-dos', and `NAME-mac', that are - identical to this coding system but have an EOL-TYPE - value of `lf', `crlf', and `cr', respectively. - `lf' - The end of a line is marked externally using ASCII LF. - Since this is also the way that XEmacs represents an - end-of-line internally, specifying this option results - in no end-of-line conversion. This is the standard - format for Unix text files. - `crlf' - The end of a line is marked externally using ASCII - CRLF. This is the standard format for MS-DOS text - files. - `cr' - The end of a line is marked externally using ASCII CR. - This is the standard format for Macintosh text files. - t - Automatically detect the end-of-line type but do not - generate subsidiary coding systems. (This value is - converted to nil when stored internally, and - `coding-system-property' will return nil.) - -`post-read-conversion' - The value is a function to call after some text is inserted and - decoded by the coding system itself and before any functions in - `after-change-functions' are called. (#### Not actually true in - XEmacs. `after-change-functions' will be called twice if - `post-read-conversion' changes something.) The argument of this - function is the same as for a function in - `after-insert-file-functions', i.e. LENGTH of the text inserted, - with point at the head of the text to be decoded. - -`pre-write-conversion' - The value is a function to call after all functions in - `write-region-annotate-functions' and `buffer-file-format' are - called, and before the text is encoded by the coding system itself. - The arguments to this function are the same as those of a function - in `write-region-annotate-functions', i.e. FROM and TO, specifying - a region of text. - - - -The following properties are allowed for FSF compatibility but currently -ignored: - -`translation-table-for-decode' - The value is a translation table to be applied on decoding. See - the function `make-translation-table' for the format of translation - table. This is not applicable to CCL-based coding systems. - -`translation-table-for-encode' - The value is a translation table to be applied on encoding. This is - not applicable to CCL-based coding systems. - -`mime-charset' - The value is a symbol of which name is `MIME-charset' parameter of - the coding system. - -`valid-codes' (meaningful only for a coding system based on CCL) - The value is a list to indicate valid byte ranges of the encoded - file. Each element of the list is an integer or a cons of integer. - In the former case, the integer value is a valid byte code. In the - latter case, the integers specifies the range of valid byte codes. - -The following properties are used by `default-query-coding-region', -the default implementation of `query-coding-region'. This -implementation and these properties are not used by the Unicode coding -systems, nor by those CCL coding systems created with -`make-8-bit-coding-system'. - -`safe-chars' - The value is a char table. If a character has non-nil value in it, - the character is safely supported by the coding system. - Under XEmacs, for the moment, this is used in addition to the - `safe-charsets' property. It does not override it as it does - under GNU Emacs. #### We need to consider if we should keep this - behaviour. - -`safe-charsets' - The value is a list of charsets safely supported by the coding - system. For coding systems based on ISO 2022, XEmacs may try to - encode characters outside these character sets, but outside of - East Asia and East Asian coding systems, it is unlikely that - consumers of the data will understand XEmacs' encoding. - The value t means that all XEmacs character sets handles are supported. - -The following additional property is recognized if TYPE is `convert-eol': - -`subtype' - One of `lf', `crlf', `cr' or nil (for autodetection). When decoding, - the corresponding sequence will be converted to LF. When encoding, - the opposite happens. This coding system converts characters to - characters. - - - -The following additional properties are recognized if TYPE is `iso2022': - -`charset-g0' -`charset-g1' -`charset-g2' -`charset-g3' - The character set initially designated to the G0 - G3 registers. - The value should be one of - - -- A charset object (designate that character set) - -- nil (do not ever use this register) - -- t (no character set is initially designated to - the register, but may be later on; this automatically - sets the corresponding `force-g*-on-output' property) - -`force-g0-on-output' -`force-g1-on-output' -`force-g2-on-output' -`force-g2-on-output' - If non-nil, send an explicit designation sequence on output before - using the specified register. - -`short' - If non-nil, use the short forms "ESC $ @", "ESC $ A", and - "ESC $ B" on output in place of the full designation sequences - "ESC $ ( @", "ESC $ ( A", and "ESC $ ( B". - -`no-ascii-eol' - If non-nil, don't designate ASCII to G0 at each end of line on output. - Setting this to non-nil also suppresses other state-resetting that - normally happens at the end of a line. - -`no-ascii-cntl' - If non-nil, don't designate ASCII to G0 before control chars on output. - -`seven' - If non-nil, use 7-bit environment on output. Otherwise, use 8-bit - environment. - -`lock-shift' - If non-nil, use locking-shift (SO/SI) instead of single-shift - or designation by escape sequence. - -`no-iso6429' - If non-nil, don't use ISO6429's direction specification. - -`escape-quoted' - If non-nil, literal control characters that are the same as - the beginning of a recognized ISO2022 or ISO6429 escape sequence - (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), - SS3 (0x8F), and CSI (0x9B)) are "quoted" with an escape character - so that they can be properly distinguished from an escape sequence. - (Note that doing this results in a non-portable encoding.) This - encoding flag is used for byte-compiled files. Note that ESC - is a good choice for a quoting character because there are no - escape sequences whose second byte is a character from the Control-0 - or Control-1 character sets; this is explicitly disallowed by the - ISO2022 standard. - -`input-charset-conversion' - A list of conversion specifications, specifying conversion of - characters in one charset to another when decoding is performed. - Each specification is a list of two elements: the source charset, - and the destination charset. - -`output-charset-conversion' - A list of conversion specifications, specifying conversion of - characters in one charset to another when encoding is performed. - The form of each specification is the same as for - `input-charset-conversion'. - - - -The following additional properties are recognized (and required) -if TYPE is `ccl': - -`decode' - CCL program used for decoding (converting to internal format). - -`encode' - CCL program used for encoding (converting to external format). - - -The following additional properties are recognized if TYPE is `chain': - -`chain' - List of coding systems to be chained together, in decoding order. - -`canonicalize-after-coding' - Coding system to be returned by the detector routines in place of - this coding system. - - - -The following additional properties are recognized if TYPE is `unicode': - -`unicode-type' - One of `utf-16', `utf-8', `ucs-4', or `utf-7' (the latter is not - yet implemented). `utf-16' is the basic two-byte encoding; - `ucs-4' is the four-byte encoding; `utf-8' is an ASCII-compatible - variable-width 8-bit encoding; `utf-7' is a 7-bit encoding using - only characters that will safely pass through all mail gateways. - [[ This should be \"transformation format\". There should also be - `ucs-2' (or `bmp' -- no surrogates) and `utf-32' (range checked). ]] - -`little-endian' - If non-nil, `utf-16' and `ucs-4' will write out the groups of two - or four bytes little-endian instead of big-endian. This is required, - for example, under Windows. - -`need-bom' - If non-nil, a byte order mark (BOM, or Unicode FFFE) should be - written out at the beginning of the data. This serves both to - identify the endianness of the following data and to mark the - data as Unicode (at least, this is how Windows uses it). - [[ The correct term is \"signature\", since this technique may also - be used with UTF-8. That is the term used in the standard. ]] - - -The following additional properties are recognized if TYPE is -`mswindows-multibyte': - -`code-page' - Either a number (specifying a particular code page) or one of the - symbols `ansi', `oem', `mac', or `ebcdic', specifying the ANSI, - OEM, Macintosh, or EBCDIC code page associated with a particular - locale (given by the `locale' property). NOTE: EBCDIC code pages - only exist in Windows 2000 and later. - -`locale' - If `code-page' is a symbol, this specifies the locale whose code - page of the corresponding type should be used. This should be - one of the following: A cons of two strings, (LANGUAGE - . SUBLANGUAGE) (see `mswindows-set-current-locale'); a string (a - language; SUBLANG_DEFAULT, i.e. the default sublanguage, is - used); or one of the symbols `current', `user-default', or - `system-default', corresponding to the values of - `mswindows-current-locale', `mswindows-user-default-locale', or - `mswindows-system-default-locale', respectively. - - - -The following additional properties are recognized if TYPE is `undecided': -\[[ Doesn't GNU use \"detect-*\" for the following two? ]] - -`do-eol' - Do EOL detection. - -`do-coding' - Do encoding detection. - -`coding-system' - If encoding detection is not done, use the specified coding system - to do decoding. This is used internally when implementing coding - systems with an EOL type that specifies autodetection (the default), - so that the detector routines return the proper subsidiary. - - - -The following additional property is recognized if TYPE is `gzip': - -`level' - Compression level: 0 through 9, or `default' (currently 6). +DEFUN ("make-coding-system-internal", Fmake_coding_system_internal, 2, 4, 0, /* +See `make-coding-system'. This does much of the work of that function. + +Without Mule support, it does all the work of that function, and an alias +exists, mapping `make-coding-system' to +`make-coding-system-internal'. You'll need a non-Mule XEmacs to read the +complete docstring. Or you can just read it in make-coding-system.el; +something like the following should work: + + \\[find-function-other-window] find-file RET \\[find-file] mule/make-coding-system.el RET */ (name, type, description, props)) @@ -1577,16 +1468,6 @@ return new_coding_system; } -DEFUN ("coding-system-canonical-name-p", Fcoding_system_canonical_name_p, - 1, 1, 0, /* -Return t if OBJECT names a coding system, and is not a coding system alias. -*/ - (object)) -{ - return CODING_SYSTEMP (Fgethash (object, Vcoding_system_hash_table, Qnil)) - ? Qt : Qnil; -} - /* #### Shouldn't this really be a find/get pair? */ DEFUN ("coding-system-alias-p", Fcoding_system_alias_p, 1, 1, 0, /* @@ -2475,6 +2356,100 @@ CODING_ENCODE); } +DEFUN ("query-coding-region", Fquery_coding_region, 3, 7, 0, /* +Work out whether CODING-SYSTEM can losslessly encode a region. + +START and END are the beginning and end of the region to check. +CODING-SYSTEM is the coding system to try. + +Optional argument BUFFER is the buffer to check, and defaults to the current +buffer. + +IGNORE-INVALID-SEQUENCESP, also an optional argument, says to treat XEmacs +characters which have an unambiguous encoded representation, despite being +undefined in what they represent, as encodable. These chiefly arise with +variable-length encodings like UTF-8 and UTF-16, where an invalid sequence +is passed through to XEmacs as a sequence of characters with a defined +correspondence to the octets on disk, but no non-error semantics; see the +`invalid-sequence-coding-system' argument to `set-language-info'. + +They can also arise with fixed-length encodings like ISO 8859-7, where +certain octets on disk have undefined values, and treating them as +corresponding to the ISO 8859-1 characters with the same numerical values +may lead to data that is not understood by other applications. + +Optional argument ERRORP says to signal a `text-conversion-error' if some +character in the region cannot be encoded, and defaults to nil. + +Optional argument HIGHLIGHT says to display unencodable characters in the +region using `query-coding-warning-face'. It defaults to nil. + +This function can return multiple values; the intention is that callers use +`multiple-value-bind' or the related CL multiple value functions to deal +with it. The first result is `t' if the region can be encoded using +CODING-SYSTEM, or `nil' if not. If the region cannot be encoded using +CODING-SYSTEM, the second result is a range table describing the positions +of the unencodable characters. + +Ranges that describe characters that would be ignored were +IGNORE-INVALID-SEQUENCESP non-nil map to the symbol `invalid-sequence'; +other ranges map to the symbol `unencodable'. If IGNORE-INVALID-SEQUENCESP +is non-nil, all ranges will map to the symbol `unencodable'. See +`make-range-table' for more details of range tables. +*/ + (start, end, coding_system, buffer, ignore_invalid_sequencesp, + errorp, highlight)) +{ + Charbpos b, e; + struct buffer *buf = decode_buffer (buffer, 1); + Lisp_Object result; + int flags = 0, speccount = specpdl_depth (); + + coding_system = Fget_coding_system (coding_system); + + get_buffer_range_char (buf, start, end, &b, &e, 0); + + if (buf != current_buffer) + { + record_unwind_protect (save_current_buffer_restore, Fcurrent_buffer ()); + set_buffer_internal (buf); + } + + record_unwind_protect (save_excursion_restore, save_excursion_save ()); + + BUF_SET_PT (buf, b); + + if (!NILP (ignore_invalid_sequencesp)) + { + flags |= QUERY_METHOD_IGNORE_INVALID_SEQUENCES; + } + + if (!NILP (errorp)) + { + flags |= QUERY_METHOD_ERRORP; + } + + if (!NILP (highlight)) + { + flags |= QUERY_METHOD_HIGHLIGHT; + } + + result = XCODESYSMETH_OR_GIVEN (coding_system, query, + (coding_system, buf, e, flags), Qunbound); + + if (UNBOUNDP (result)) + { + signal_error (Qtext_conversion_error, + "Coding system doesn't say what it can encode", + XCODING_SYSTEM_NAME (coding_system)); + } + + result = (NILP (result)) ? Qt : values2 (Qnil, result); + + return unbind_to_1 (speccount, result); +} + + /************************************************************************/ /* Chain methods */ @@ -4550,7 +4525,7 @@ DEFSUBR (Fget_coding_system); DEFSUBR (Fcoding_system_list); DEFSUBR (Fcoding_system_name); - DEFSUBR (Fmake_coding_system); + DEFSUBR (Fmake_coding_system_internal); DEFSUBR (Fcopy_coding_system); DEFSUBR (Fcoding_system_canonical_name_p); DEFSUBR (Fcoding_system_alias_p); @@ -4573,6 +4548,7 @@ DEFSUBR (Fdetect_coding_region); DEFSUBR (Fdecode_coding_region); DEFSUBR (Fencode_coding_region); + DEFSUBR (Fquery_coding_region); DEFSYMBOL_MULTIWORD_PREDICATE (Qcoding_systemp); DEFSYMBOL (Qno_conversion); DEFSYMBOL (Qconvert_eol); @@ -4621,6 +4597,10 @@ DEFSYMBOL (Qescape_quoted); + DEFSYMBOL (Qquery_coding_warning_face); + DEFSYMBOL (Qaliases); + DEFSYMBOL (Qcharset_skip_chars_string); + #ifdef HAVE_ZLIB DEFSYMBOL (Qgzip); #endif @@ -4844,6 +4824,12 @@ */ ); Vdebug_coding_detection = Qnil; #endif + +#ifdef MULE + Vdefault_query_coding_region_chartab_cache + = make_lisp_hash_table (25, HASH_TABLE_NON_WEAK, HASH_TABLE_EQUAL); + staticpro (&Vdefault_query_coding_region_chartab_cache); +#endif } /* #### reformat this for consistent appearance? */ @@ -4851,7 +4837,7 @@ void complex_vars_of_file_coding (void) { - Fmake_coding_system + Fmake_coding_system_internal (Qconvert_eol_cr, Qconvert_eol, build_msg_string ("Convert CR to LF"), nconc2 (list6 (Qdocumentation, @@ -4863,9 +4849,10 @@ /* VERY IMPORTANT! Tell make-coding-system not to generate subsidiaries -- it needs the coding systems we're creating to do so! */ - list2 (Qeol_type, Qlf))); - - Fmake_coding_system + list4 (Qeol_type, Qlf, + Qsafe_charsets, Qt))); + + Fmake_coding_system_internal (Qconvert_eol_lf, Qconvert_eol, build_msg_string ("Convert LF to LF (do nothing)"), nconc2 (list6 (Qdocumentation, @@ -4876,9 +4863,10 @@ /* VERY IMPORTANT! Tell make-coding-system not to generate subsidiaries -- it needs the coding systems we're creating to do so! */ - list2 (Qeol_type, Qlf))); - - Fmake_coding_system + list4 (Qeol_type, Qlf, + Qsafe_charsets, Qt))); + + Fmake_coding_system_internal (Qconvert_eol_crlf, Qconvert_eol, build_msg_string ("Convert CRLF to LF"), nconc2 (list6 (Qdocumentation, @@ -4887,12 +4875,14 @@ "(used internally and under Unix to mark the end of a line)."), Qmnemonic, build_string ("CRLF->LF"), Qsubtype, Qcrlf), + /* VERY IMPORTANT! Tell make-coding-system not to generate subsidiaries -- it needs the coding systems we're creating to do so! */ - list2 (Qeol_type, Qlf))); - - Fmake_coding_system + list4 (Qeol_type, Qlf, + Qsafe_charsets, Qt))); + + Fmake_coding_system_internal (Qconvert_eol_autodetect, Qconvert_eol, build_msg_string ("Autodetect EOL type"), nconc2 (list6 (Qdocumentation, @@ -4903,9 +4893,10 @@ /* VERY IMPORTANT! Tell make-coding-system not to generate subsidiaries -- it needs the coding systems we're creating to do so! */ - list2 (Qeol_type, Qlf))); - - Fmake_coding_system + list4 (Qeol_type, Qlf, + Qsafe_charsets, Qt))); + + Fmake_coding_system_internal (Qundecided, Qundecided, build_msg_string ("Undecided (auto-detect)"), nconc2 (list4 (Qdocumentation, @@ -4918,7 +4909,7 @@ though, I don't think.) */ Qeol_type, Qlf))); - Fmake_coding_system + Fmake_coding_system_internal (intern ("undecided-dos"), Qundecided, build_msg_string ("Undecided (auto-detect) (CRLF)"), nconc2 (list4 (Qdocumentation, @@ -4928,7 +4919,7 @@ list4 (Qdo_coding, Qt, Qeol_type, Qcrlf))); - Fmake_coding_system + Fmake_coding_system_internal (intern ("undecided-unix"), Qundecided, build_msg_string ("Undecided (auto-detect) (LF)"), nconc2 (list4 (Qdocumentation, @@ -4938,7 +4929,7 @@ list4 (Qdo_coding, Qt, Qeol_type, Qlf))); - Fmake_coding_system + Fmake_coding_system_internal (intern ("undecided-mac"), Qundecided, build_msg_string ("Undecided (auto-detect) (CR)"), nconc2 (list4 (Qdocumentation, @@ -4949,26 +4940,42 @@ Qeol_type, Qcr))); /* Need to create this here or we're really screwed. */ - Fmake_coding_system + Fmake_coding_system_internal (Qraw_text, Qno_conversion, build_msg_string ("Raw Text"), - list4 (Qdocumentation, - build_msg_string ("Raw text converts only line-break codes, and acts otherwise like `binary'."), - Qmnemonic, build_string ("Raw"))); - - Fmake_coding_system + nconc2 (list4 (Qdocumentation, + build_msg_string ("Raw text converts only line-break " + "codes, and acts otherwise like " + "`binary'."), + Qmnemonic, build_string ("Raw")), +#ifdef MULE + list2 (Qsafe_charsets, list3 (Vcharset_ascii, Vcharset_control_1, + Vcharset_latin_iso8859_1)))); + +#else + Qnil)); +#endif + + Fmake_coding_system_internal (Qbinary, Qno_conversion, build_msg_string ("Binary"), - list6 (Qdocumentation, - build_msg_string ( + nconc2 (list6 (Qdocumentation, + build_msg_string ( "This coding system is as close as it comes to doing no conversion.\n" "On input, each byte is converted directly into the character\n" "with the corresponding code -- i.e. from the `ascii', `control-1',\n" "or `latin-1' character sets. On output, these characters are\n" "converted back to the corresponding bytes, and other characters\n" "are converted to the default character, i.e. `~'."), - Qeol_type, Qlf, - Qmnemonic, build_string ("Binary"))); + Qeol_type, Qlf, + Qmnemonic, build_string ("Binary")), +#ifdef MULE + list2 (Qsafe_charsets, list3 (Vcharset_ascii, Vcharset_control_1, + Vcharset_latin_iso8859_1)))); + +#else + Qnil)); +#endif /* Formerly aliased to raw-text! Completely bogus and not even the same as FSF Emacs. */