Mercurial > hg > xemacs-beta
diff lisp/unicode.el @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | 75e7ab37b6c8 |
children | e29fcfd8df5f |
line wrap: on
line diff
--- a/lisp/unicode.el Sat Sep 19 17:56:23 2009 +0200 +++ b/lisp/unicode.el Sat Sep 19 22:53:13 2009 +0100 @@ -164,6 +164,68 @@ latin-jisx0201 chinese-cns11643-3 chinese-cns11643-4 chinese-cns11643-5 chinese-cns11643-6 chinese-cns11643-7))))) +(defconst ccl-encode-to-ucs-2 + (eval-when-compile + (let ((pre-existing + ;; This is the compiled CCL program from the assert + ;; below. Since this file is dumped and ccl.el isn't (and + ;; even when it was, it was dumped much later than this + ;; one), we can't compile the program at dump time. We can + ;; check at byte compile time that the program is as + ;; expected, though. + [1 16 131127 7 98872 65823 1307 5 -65536 65313 64833 1028 + 147513 8 82009 255 22])) + (when (featurep 'mule) + ;; Check that the pre-existing constant reflects the intended + ;; CCL program. + (assert + (equal pre-existing + (ccl-compile + `(1 + ( ;; mule-to-unicode's first argument is the + ;; charset ID, the second its first byte + ;; left shifted by 7 bits masked with its + ;; second byte. + (r1 = (r1 << 7)) + (r1 = (r1 | r2)) + (mule-to-unicode r0 r1) + (if (r0 & ,(lognot #xFFFF)) + ;; Redisplay looks in r1 and r2 for the first + ;; and second bytes of the X11 font, + ;; respectively. For non-BMP characters we + ;; display U+FFFD. + ((r1 = #xFF) + (r2 = #xFD)) + ((r1 = (r0 >> 8)) + (r2 = (r0 & #xFF)))))))) + nil + "The pre-compiled CCL program appears broken. ")) + pre-existing)) + "CCL program to transform Mule characters to UCS-2.") + +(when (featurep 'mule) + (put 'ccl-encode-to-ucs-2 'ccl-program-idx + (declare-fboundp + (register-ccl-program 'ccl-encode-to-ucs-2 ccl-encode-to-ucs-2)))) + +(defun decode-char (quote-ucs code &optional restriction) + "FSF compatibility--return Mule character with Unicode codepoint CODE. +The second argument must be 'ucs, the third argument is ignored. " + ;; We're prepared to accept invalid Unicode in unicode-to-char, but not in + ;; this function, which is the API that should actually be used, since + ;; it's available in GNU and in Mule-UCS. + (check-argument-range code #x0 #x10FFFF) + (assert (eq quote-ucs 'ucs) t + "Sorry, decode-char doesn't yet support anything but the UCS. ") + (unicode-to-char code)) + +(defun encode-char (char quote-ucs &optional restriction) + "FSF compatibility--return the Unicode code point of CHAR. +The second argument must be 'ucs, the third argument is ignored. " + (assert (eq quote-ucs 'ucs) t + "Sorry, encode-char doesn't yet support anything but the UCS. ") + (char-to-unicode char)) + (make-coding-system 'utf-16 'unicode "UTF-16" @@ -309,68 +371,6 @@ little-endian t need-bom t)) -(defun decode-char (quote-ucs code &optional restriction) - "FSF compatibility--return Mule character with Unicode codepoint CODE. -The second argument must be 'ucs, the third argument is ignored. " - ;; We're prepared to accept invalid Unicode in unicode-to-char, but not in - ;; this function, which is the API that should actually be used, since - ;; it's available in GNU and in Mule-UCS. - (check-argument-range code #x0 #x10FFFF) - (assert (eq quote-ucs 'ucs) t - "Sorry, decode-char doesn't yet support anything but the UCS. ") - (unicode-to-char code)) - -(defun encode-char (char quote-ucs &optional restriction) - "FSF compatibility--return the Unicode code point of CHAR. -The second argument must be 'ucs, the third argument is ignored. " - (assert (eq quote-ucs 'ucs) t - "Sorry, encode-char doesn't yet support anything but the UCS. ") - (char-to-unicode char)) - -(defconst ccl-encode-to-ucs-2 - (eval-when-compile - (let ((pre-existing - ;; This is the compiled CCL program from the assert - ;; below. Since this file is dumped and ccl.el isn't (and - ;; even when it was, it was dumped much later than this - ;; one), we can't compile the program at dump time. We can - ;; check at byte compile time that the program is as - ;; expected, though. - [1 16 131127 7 98872 65823 1307 5 -65536 65313 64833 1028 - 147513 8 82009 255 22])) - (when (featurep 'mule) - ;; Check that the pre-existing constant reflects the intended - ;; CCL program. - (assert - (equal pre-existing - (ccl-compile - `(1 - ( ;; mule-to-unicode's first argument is the - ;; charset ID, the second its first byte - ;; left shifted by 7 bits masked with its - ;; second byte. - (r1 = (r1 << 7)) - (r1 = (r1 | r2)) - (mule-to-unicode r0 r1) - (if (r0 & ,(lognot #xFFFF)) - ;; Redisplay looks in r1 and r2 for the first - ;; and second bytes of the X11 font, - ;; respectively. For non-BMP characters we - ;; display U+FFFD. - ((r1 = #xFF) - (r2 = #xFD)) - ((r1 = (r0 >> 8)) - (r2 = (r0 & #xFF)))))))) - nil - "The pre-compiled CCL program appears broken. ")) - pre-existing)) - "CCL program to transform Mule characters to UCS-2.") - -(when (featurep 'mule) - (put 'ccl-encode-to-ucs-2 'ccl-program-idx - (declare-fboundp - (register-ccl-program 'ccl-encode-to-ucs-2 ccl-encode-to-ucs-2)))) - ;; Now, create jit-ucs-charset-0 entries for those characters in Windows ;; Glyph List 4 that would otherwise end up in East Asian character sets. ;; @@ -613,112 +613,6 @@ ;; Sure would be nice to be able to use defface here. (copy-face 'highlight 'unicode-invalid-sequence-warning-face) -(defvar unicode-query-coding-skip-chars-arg nil ;; Set in general-late.el - "Used by `unicode-query-coding-region' to skip chars with known mappings.") - -(defun unicode-query-coding-region (begin end coding-system - &optional buffer ignore-invalid-sequencesp - errorp highlightp) - "The `query-coding-region' implementation for Unicode coding systems. - -Supports IGNORE-INVALID-SEQUENCESP, that is, XEmacs characters that reflect -invalid octets on disk will be treated as encodable if this argument is -specified, and as not encodable if it is not specified." - - ;; Potential problem here; the octets that correspond to octets from #x00 - ;; to #x7f on disk will be treated by utf-8 and utf-7 as invalid - ;; sequences, and thus, in theory, encodable. - - (check-argument-type #'coding-system-p - (setq coding-system (find-coding-system coding-system))) - (check-argument-type #'integer-or-marker-p begin) - (check-argument-type #'integer-or-marker-p end) - (let* ((skip-chars-arg (concat unicode-query-coding-skip-chars-arg - (if ignore-invalid-sequencesp - unicode-invalid-sequence-regexp-range - ""))) - (ranges (make-range-table)) - (looking-at-arg (concat "[" skip-chars-arg "]")) - (case-fold-search nil) - (invalid-sequence-lower-unicode-bound - (char-to-unicode - (aref (decode-coding-string "\xd8\x00\x00\x00" - 'utf-16-be) 3))) - (invalid-sequence-upper-unicode-bound - (char-to-unicode - (aref (decode-coding-string "\xd8\x00\x00\xFF" - 'utf-16-be) 3))) - fail-range-start fail-range-end char-after failed - extent char-unicode failed-reason previous-failed-reason) - (save-excursion - (when highlightp - (query-coding-clear-highlights begin end buffer)) - (goto-char begin buffer) - (skip-chars-forward skip-chars-arg end buffer) - (while (< (point buffer) end) - (setq char-after (char-after (point buffer) buffer) - fail-range-start (point buffer)) - (while (and - (< (point buffer) end) - (not (looking-at looking-at-arg)) - (or (and - (= -1 (setq char-unicode (char-to-unicode char-after))) - (setq failed-reason 'unencodable)) - (and (not ignore-invalid-sequencesp) - ;; The default case, with ignore-invalid-sequencesp - ;; not specified: - ;; If the character is in the Unicode range that - ;; corresponds to an invalid octet, we want to - ;; treat it as unencodable. - (<= invalid-sequence-lower-unicode-bound - char-unicode) - (<= char-unicode - invalid-sequence-upper-unicode-bound) - (setq failed-reason 'invalid-sequence))) - (or (null previous-failed-reason) - (eq previous-failed-reason failed-reason))) - (forward-char 1 buffer) - (setq char-after (char-after (point buffer) buffer) - failed t - previous-failed-reason failed-reason)) - (if (= fail-range-start (point buffer)) - ;; The character can actually be encoded by the coding - ;; system; check the characters past it. - (forward-char 1 buffer) - ;; Can't be encoded; note this. - (when errorp - (error 'text-conversion-error - (format "Cannot encode %s using coding system" - (buffer-substring fail-range-start (point buffer) - buffer)) - (coding-system-name coding-system))) - (assert - (not (null previous-failed-reason)) t - "If we've got here, previous-failed-reason should be non-nil.") - (put-range-table fail-range-start - ;; If char-after is non-nil, we're not at - ;; the end of the buffer. - (setq fail-range-end (if char-after - (point buffer) - (point-max buffer))) - previous-failed-reason ranges) - (setq previous-failed-reason nil) - (when highlightp - (setq extent (make-extent fail-range-start fail-range-end buffer)) - (set-extent-priority extent (+ mouse-highlight-priority 2)) - (set-extent-face extent 'query-coding-warning-face))) - (skip-chars-forward skip-chars-arg end buffer)) - (if failed - (values nil ranges) - (values t nil))))) - -(loop - for coding-system in (coding-system-list) - initially (unless (featurep 'mule) (return)) - do (when (eq 'unicode (coding-system-type coding-system)) - (coding-system-put coding-system 'query-coding-function - #'unicode-query-coding-region))) - (unless (featurep 'mule) ;; We do this in such a roundabout way--instead of having the above defun ;; and defvar calls inside a (when (featurep 'mule) ...) form--to have