Mercurial > hg > xemacs-beta
comparison src/mule-coding.c @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | 726060ee587c |
children | 3ba90c659d01 |
comparison
equal
deleted
inserted
replaced
4689:0636c6ccb430 | 4690:257b468bf2ca |
---|---|
34 #include "lisp.h" | 34 #include "lisp.h" |
35 | 35 |
36 #include "charset.h" | 36 #include "charset.h" |
37 #include "mule-ccl.h" | 37 #include "mule-ccl.h" |
38 #include "file-coding.h" | 38 #include "file-coding.h" |
39 #include "elhash.h" | |
40 #include "rangetab.h" | |
41 #include "buffer.h" | |
42 #include "extents.h" | |
39 | 43 |
40 Lisp_Object Qshift_jis, Qiso2022, Qbig5, Qccl; | 44 Lisp_Object Qshift_jis, Qiso2022, Qbig5, Qccl; |
41 | 45 |
42 Lisp_Object Qcharset_g0, Qcharset_g1, Qcharset_g2, Qcharset_g3; | 46 Lisp_Object Qcharset_g0, Qcharset_g1, Qcharset_g2, Qcharset_g3; |
43 Lisp_Object Qforce_g0_on_output, Qforce_g1_on_output; | 47 Lisp_Object Qforce_g0_on_output, Qforce_g1_on_output; |
45 Lisp_Object Qno_iso6429; | 49 Lisp_Object Qno_iso6429; |
46 Lisp_Object Qinput_charset_conversion, Qoutput_charset_conversion; | 50 Lisp_Object Qinput_charset_conversion, Qoutput_charset_conversion; |
47 Lisp_Object Qshort, Qno_ascii_eol, Qno_ascii_cntl, Qseven, Qlock_shift; | 51 Lisp_Object Qshort, Qno_ascii_eol, Qno_ascii_cntl, Qseven, Qlock_shift; |
48 | 52 |
49 Lisp_Object Qiso_7, Qiso_8_designate, Qiso_8_1, Qiso_8_2, Qiso_lock_shift; | 53 Lisp_Object Qiso_7, Qiso_8_designate, Qiso_8_1, Qiso_8_2, Qiso_lock_shift; |
54 | |
55 Lisp_Object Qfrom_unicode, Qquery_skip_chars, Qinvalid_sequences_skip_chars; | |
56 Lisp_Object Qfixed_width; | |
50 | 57 |
51 | 58 |
52 /************************************************************************/ | 59 /************************************************************************/ |
53 /* Shift-JIS methods */ | 60 /* Shift-JIS methods */ |
54 /************************************************************************/ | 61 /************************************************************************/ |
3386 else if (EQ (prop, Qencode)) | 3393 else if (EQ (prop, Qencode)) |
3387 return XCODING_SYSTEM_CCL_ENCODE (coding_system); | 3394 return XCODING_SYSTEM_CCL_ENCODE (coding_system); |
3388 else | 3395 else |
3389 return Qunbound; | 3396 return Qunbound; |
3390 } | 3397 } |
3398 | |
3399 /************************************************************************/ | |
3400 /* FIXED_WIDTH methods */ | |
3401 /************************************************************************/ | |
3402 | |
3403 struct fixed_width_coding_system | |
3404 { | |
3405 /* For a fixed_width coding system, these specify the CCL programs | |
3406 used for decoding (input) and encoding (output). */ | |
3407 Lisp_Object decode; | |
3408 Lisp_Object encode; | |
3409 Lisp_Object from_unicode; | |
3410 Lisp_Object invalid_sequences_skip_chars; | |
3411 Lisp_Object query_skip_chars; | |
3412 | |
3413 /* This is not directly accessible from Lisp; it is a concatenation of the | |
3414 previous two strings, used for simplicity of implementation. */ | |
3415 Lisp_Object invalid_and_query_skip_chars; | |
3416 }; | |
3417 | |
3418 #define CODING_SYSTEM_FIXED_WIDTH_DECODE(codesys) \ | |
3419 (CODING_SYSTEM_TYPE_DATA (codesys, fixed_width)->decode) | |
3420 #define CODING_SYSTEM_FIXED_WIDTH_ENCODE(codesys) \ | |
3421 (CODING_SYSTEM_TYPE_DATA (codesys, fixed_width)->encode) | |
3422 #define CODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE(codesys) \ | |
3423 (CODING_SYSTEM_TYPE_DATA (codesys, fixed_width)->from_unicode) | |
3424 #define CODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS(codesys) \ | |
3425 (CODING_SYSTEM_TYPE_DATA (codesys, \ | |
3426 fixed_width)->invalid_sequences_skip_chars) | |
3427 #define CODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS(codesys) \ | |
3428 (CODING_SYSTEM_TYPE_DATA (codesys, fixed_width)->query_skip_chars) | |
3429 #define CODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS(codesys) \ | |
3430 (CODING_SYSTEM_TYPE_DATA (codesys, \ | |
3431 fixed_width)->invalid_and_query_skip_chars) | |
3432 | |
3433 #define XCODING_SYSTEM_FIXED_WIDTH_DECODE(codesys) \ | |
3434 CODING_SYSTEM_FIXED_WIDTH_DECODE (XCODING_SYSTEM (codesys)) | |
3435 #define XCODING_SYSTEM_FIXED_WIDTH_ENCODE(codesys) \ | |
3436 CODING_SYSTEM_FIXED_WIDTH_ENCODE (XCODING_SYSTEM (codesys)) | |
3437 #define XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE(codesys) \ | |
3438 (CODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (XCODING_SYSTEM (codesys))) | |
3439 #define XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS(codesys) \ | |
3440 (CODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS \ | |
3441 (XCODING_SYSTEM (codesys))) | |
3442 #define XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS(codesys) \ | |
3443 (CODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (XCODING_SYSTEM (codesys))) | |
3444 #define XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS(codesys) \ | |
3445 (CODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS \ | |
3446 (XCODING_SYSTEM(codesys))) | |
3447 | |
3448 struct fixed_width_coding_stream | |
3449 { | |
3450 /* state of the running CCL program */ | |
3451 struct ccl_program ccl; | |
3452 }; | |
3453 | |
3454 static const struct memory_description | |
3455 fixed_width_coding_system_description[] = { | |
3456 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, decode) }, | |
3457 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, encode) }, | |
3458 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, | |
3459 from_unicode) }, | |
3460 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, | |
3461 invalid_sequences_skip_chars) }, | |
3462 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, | |
3463 query_skip_chars) }, | |
3464 { XD_LISP_OBJECT, offsetof (struct fixed_width_coding_system, | |
3465 invalid_and_query_skip_chars) }, | |
3466 { XD_END } | |
3467 }; | |
3468 | |
3469 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (fixed_width); | |
3470 | |
3471 static void | |
3472 fixed_width_mark (Lisp_Object codesys) | |
3473 { | |
3474 mark_object (XCODING_SYSTEM_FIXED_WIDTH_DECODE (codesys)); | |
3475 mark_object (XCODING_SYSTEM_FIXED_WIDTH_ENCODE (codesys)); | |
3476 mark_object (XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (codesys)); | |
3477 mark_object | |
3478 (XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS (codesys)); | |
3479 mark_object (XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (codesys) ); | |
3480 mark_object | |
3481 (XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS(codesys)); | |
3482 } | |
3483 | |
3484 static Bytecount | |
3485 fixed_width_convert (struct coding_stream *str, const UExtbyte *src, | |
3486 unsigned_char_dynarr *dst, Bytecount n) | |
3487 { | |
3488 struct fixed_width_coding_stream *data = | |
3489 CODING_STREAM_TYPE_DATA (str, fixed_width); | |
3490 Bytecount orign = n; | |
3491 | |
3492 data->ccl.last_block = str->eof; | |
3493 /* When applying a CCL program to a stream, SRC must not be NULL -- this | |
3494 is a special signal to the driver that read and write operations are | |
3495 not allowed. The code does not actually look at what SRC points to if | |
3496 N == 0. */ | |
3497 ccl_driver (&data->ccl, src ? src : (const unsigned char *) "", | |
3498 dst, n, 0, | |
3499 str->direction == CODING_DECODE ? CCL_MODE_DECODING : | |
3500 CCL_MODE_ENCODING); | |
3501 return orign; | |
3502 } | |
3503 | |
3504 static void | |
3505 fixed_width_init_coding_stream (struct coding_stream *str) | |
3506 { | |
3507 struct fixed_width_coding_stream *data = | |
3508 CODING_STREAM_TYPE_DATA (str, fixed_width); | |
3509 | |
3510 setup_ccl_program (&data->ccl, | |
3511 str->direction == CODING_DECODE ? | |
3512 XCODING_SYSTEM_FIXED_WIDTH_DECODE (str->codesys) : | |
3513 XCODING_SYSTEM_FIXED_WIDTH_ENCODE (str->codesys)); | |
3514 } | |
3515 | |
3516 static void | |
3517 fixed_width_rewind_coding_stream (struct coding_stream *str) | |
3518 { | |
3519 fixed_width_init_coding_stream (str); | |
3520 } | |
3521 | |
3522 static void | |
3523 fixed_width_init (Lisp_Object codesys) | |
3524 { | |
3525 XCODING_SYSTEM_FIXED_WIDTH_DECODE (codesys) = Qnil; | |
3526 XCODING_SYSTEM_FIXED_WIDTH_ENCODE (codesys) = Qnil; | |
3527 XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (codesys) = Qnil; | |
3528 XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS (codesys) = Qnil; | |
3529 XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (codesys) = Qnil; | |
3530 XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS(codesys) = Qnil; | |
3531 } | |
3532 | |
3533 static int | |
3534 fixed_width_putprop (Lisp_Object codesys, Lisp_Object key, | |
3535 Lisp_Object value) | |
3536 { | |
3537 struct ccl_program test_ccl; | |
3538 | |
3539 if (EQ (key, Qdecode) || EQ (key, Qencode)) | |
3540 { | |
3541 /* Check if the CCL infrastructure thinks this is a sane CCL | |
3542 program: */ | |
3543 if (setup_ccl_program (&test_ccl, value) < 0) | |
3544 { | |
3545 invalid_argument ("Invalid CCL program", value); | |
3546 } | |
3547 | |
3548 if (EQ (key, Qdecode)) | |
3549 { | |
3550 XCODING_SYSTEM_FIXED_WIDTH_DECODE (codesys) = value; | |
3551 } | |
3552 else | |
3553 { | |
3554 XCODING_SYSTEM_FIXED_WIDTH_ENCODE (codesys) = value; | |
3555 } | |
3556 } | |
3557 else if (EQ (key, Qfrom_unicode)) | |
3558 { | |
3559 CHECK_HASH_TABLE (value); | |
3560 XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (codesys) = value; | |
3561 } | |
3562 else if (EQ (key, Qinvalid_sequences_skip_chars)) | |
3563 { | |
3564 CHECK_STRING (value); | |
3565 | |
3566 /* Make sure Lisp can't make our data inconsistent: */ | |
3567 value = Fcopy_sequence (value); | |
3568 | |
3569 XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS (codesys) | |
3570 = value; | |
3571 | |
3572 XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS (codesys) | |
3573 = concat2 (value, | |
3574 XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (codesys)); | |
3575 } | |
3576 else if (EQ (key, Qquery_skip_chars)) | |
3577 { | |
3578 CHECK_STRING (value); | |
3579 | |
3580 /* Make sure Lisp can't make our data inconsistent: */ | |
3581 value = Fcopy_sequence (value); | |
3582 | |
3583 XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (codesys) = value; | |
3584 | |
3585 XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS (codesys) | |
3586 = concat2 (value, | |
3587 XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS | |
3588 (codesys)); | |
3589 } | |
3590 else | |
3591 { | |
3592 return 0; | |
3593 } | |
3594 | |
3595 return 1; | |
3596 } | |
3597 | |
3598 static Lisp_Object | |
3599 fixed_width_getprop (Lisp_Object codesys, Lisp_Object prop) | |
3600 { | |
3601 if (EQ (prop, Qdecode)) | |
3602 { | |
3603 return XCODING_SYSTEM_FIXED_WIDTH_DECODE (codesys); | |
3604 } | |
3605 else if (EQ (prop, Qencode)) | |
3606 { | |
3607 return XCODING_SYSTEM_FIXED_WIDTH_ENCODE (codesys); | |
3608 } | |
3609 else if (EQ (prop, Qfrom_unicode)) | |
3610 { | |
3611 return XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (codesys); | |
3612 } | |
3613 else if (EQ (prop, Qinvalid_sequences_skip_chars)) | |
3614 { | |
3615 /* Make sure Lisp can't make our data inconsistent: */ | |
3616 return | |
3617 Fcopy_sequence | |
3618 (XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS (codesys)); | |
3619 } | |
3620 else if (EQ (prop, Qquery_skip_chars)) | |
3621 { | |
3622 return | |
3623 Fcopy_sequence (XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS (codesys)); | |
3624 } | |
3625 | |
3626 return Qunbound; | |
3627 } | |
3628 | |
3629 static Lisp_Object Vfixed_width_query_ranges_cache; | |
3630 | |
3631 static Lisp_Object | |
3632 fixed_width_skip_chars_data_given_strings (Lisp_Object string, | |
3633 Lisp_Object query_skip_chars, | |
3634 Lisp_Object | |
3635 invalid_sequences_skip_chars, | |
3636 Binbyte *fastmap, | |
3637 int fastmap_len) | |
3638 { | |
3639 Lisp_Object result = Fgethash (string, | |
3640 Vfixed_width_query_ranges_cache, | |
3641 Qnil); | |
3642 REGISTER Ibyte *p, *pend; | |
3643 REGISTER Ichar c; | |
3644 | |
3645 memset (fastmap, query_coding_unencodable, fastmap_len); | |
3646 | |
3647 if (!NILP (result)) | |
3648 { | |
3649 int i; | |
3650 Lisp_Object ranged; | |
3651 assert (RANGE_TABLEP (result)); | |
3652 for (i = 0; i < fastmap_len; ++i) | |
3653 { | |
3654 ranged = Fget_range_table (make_int (i), result, Qnil); | |
3655 | |
3656 if (EQ (ranged, Qsucceeded)) | |
3657 { | |
3658 fastmap [i] = query_coding_succeeded; | |
3659 } | |
3660 else if (EQ (ranged, Qinvalid_sequence)) | |
3661 { | |
3662 fastmap [i] = query_coding_invalid_sequence; | |
3663 } | |
3664 } | |
3665 return result; | |
3666 } | |
3667 | |
3668 result = Fmake_range_table (Qstart_closed_end_closed); | |
3669 | |
3670 p = XSTRING_DATA (query_skip_chars); | |
3671 pend = p + XSTRING_LENGTH (query_skip_chars); | |
3672 | |
3673 while (p != pend) | |
3674 { | |
3675 c = itext_ichar (p); | |
3676 | |
3677 INC_IBYTEPTR (p); | |
3678 | |
3679 if (c == '\\') | |
3680 { | |
3681 if (p == pend) break; | |
3682 c = itext_ichar (p); | |
3683 INC_IBYTEPTR (p); | |
3684 } | |
3685 | |
3686 if (p != pend && *p == '-') | |
3687 { | |
3688 Ichar cend; | |
3689 | |
3690 /* Skip over the dash. */ | |
3691 p++; | |
3692 if (p == pend) break; | |
3693 cend = itext_ichar (p); | |
3694 | |
3695 Fput_range_table (make_int (c), make_int (cend), Qsucceeded, | |
3696 result); | |
3697 | |
3698 while (c <= cend && c < fastmap_len) | |
3699 { | |
3700 fastmap[c] = query_coding_succeeded; | |
3701 c++; | |
3702 } | |
3703 | |
3704 INC_IBYTEPTR (p); | |
3705 } | |
3706 else | |
3707 { | |
3708 if (c < fastmap_len) | |
3709 fastmap[c] = query_coding_succeeded; | |
3710 | |
3711 Fput_range_table (make_int (c), make_int (c), Qsucceeded, result); | |
3712 } | |
3713 } | |
3714 | |
3715 | |
3716 p = XSTRING_DATA (invalid_sequences_skip_chars); | |
3717 pend = p + XSTRING_LENGTH (invalid_sequences_skip_chars); | |
3718 | |
3719 while (p != pend) | |
3720 { | |
3721 c = itext_ichar (p); | |
3722 | |
3723 INC_IBYTEPTR (p); | |
3724 | |
3725 if (c == '\\') | |
3726 { | |
3727 if (p == pend) break; | |
3728 c = itext_ichar (p); | |
3729 INC_IBYTEPTR (p); | |
3730 } | |
3731 | |
3732 if (p != pend && *p == '-') | |
3733 { | |
3734 Ichar cend; | |
3735 | |
3736 /* Skip over the dash. */ | |
3737 p++; | |
3738 if (p == pend) break; | |
3739 cend = itext_ichar (p); | |
3740 | |
3741 Fput_range_table (make_int (c), make_int (cend), Qinvalid_sequence, | |
3742 result); | |
3743 | |
3744 while (c <= cend && c < fastmap_len) | |
3745 { | |
3746 fastmap[c] = query_coding_invalid_sequence; | |
3747 c++; | |
3748 } | |
3749 | |
3750 INC_IBYTEPTR (p); | |
3751 } | |
3752 else | |
3753 { | |
3754 if (c < fastmap_len) | |
3755 fastmap[c] = query_coding_invalid_sequence; | |
3756 | |
3757 Fput_range_table (make_int (c), make_int (c), Qinvalid_sequence, | |
3758 result); | |
3759 } | |
3760 } | |
3761 | |
3762 Fputhash (string, result, Vfixed_width_query_ranges_cache); | |
3763 | |
3764 return result; | |
3765 } | |
3766 | |
3767 static Lisp_Object | |
3768 fixed_width_query (Lisp_Object codesys, struct buffer *buf, | |
3769 Charbpos end, int flags) | |
3770 { | |
3771 Charbpos pos = BUF_PT (buf), fail_range_start, fail_range_end; | |
3772 Charbpos pos_byte = BYTE_BUF_PT (buf); | |
3773 Lisp_Object skip_chars_range_table, from_unicode, checked_unicode, | |
3774 result = Qnil; | |
3775 enum query_coding_failure_reasons failed_reason, | |
3776 previous_failed_reason = query_coding_succeeded; | |
3777 Binbyte fastmap[0xff]; | |
3778 | |
3779 from_unicode = XCODING_SYSTEM_FIXED_WIDTH_FROM_UNICODE (codesys); | |
3780 | |
3781 skip_chars_range_table = | |
3782 fixed_width_skip_chars_data_given_strings | |
3783 ((flags & QUERY_METHOD_IGNORE_INVALID_SEQUENCES ? | |
3784 XCODING_SYSTEM_FIXED_WIDTH_INVALID_AND_QUERY_SKIP_CHARS | |
3785 (codesys) : | |
3786 XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS(codesys)), | |
3787 XCODING_SYSTEM_FIXED_WIDTH_QUERY_SKIP_CHARS(codesys), | |
3788 (flags & QUERY_METHOD_IGNORE_INVALID_SEQUENCES ? | |
3789 build_string("") : | |
3790 XCODING_SYSTEM_FIXED_WIDTH_INVALID_SEQUENCES_SKIP_CHARS (codesys)), | |
3791 fastmap, (int)(sizeof (fastmap))); | |
3792 | |
3793 if (flags & QUERY_METHOD_HIGHLIGHT && | |
3794 /* If we're being called really early, live without highlights getting | |
3795 cleared properly: */ | |
3796 !(UNBOUNDP (XSYMBOL (Qquery_coding_clear_highlights)->function))) | |
3797 { | |
3798 /* It's okay to call Lisp here, the only non-stack object we may have | |
3799 allocated up to this point is skip_chars_range_table, and that's | |
3800 reachable from its entry in Vfixed_width_query_ranges_cache. */ | |
3801 call3 (Qquery_coding_clear_highlights, make_int (pos), make_int (end), | |
3802 wrap_buffer (buf)); | |
3803 } | |
3804 | |
3805 while (pos < end) | |
3806 { | |
3807 Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte); | |
3808 if ((ch < (int) (sizeof(fastmap))) ? | |
3809 (fastmap[ch] == query_coding_succeeded) : | |
3810 (EQ (Qsucceeded, Fget_range_table (make_int (ch), | |
3811 skip_chars_range_table, Qnil)))) | |
3812 { | |
3813 pos++; | |
3814 INC_BYTEBPOS (buf, pos_byte); | |
3815 } | |
3816 else | |
3817 { | |
3818 fail_range_start = pos; | |
3819 while ((pos < end) && | |
3820 ((!(flags & QUERY_METHOD_IGNORE_INVALID_SEQUENCES) && | |
3821 EQ (Qinvalid_sequence, Fget_range_table | |
3822 (make_int (ch), skip_chars_range_table, Qnil)) | |
3823 && (failed_reason = query_coding_invalid_sequence)) | |
3824 || ((NILP ((checked_unicode = | |
3825 Fgethash (Fchar_to_unicode (make_char (ch)), | |
3826 from_unicode, Qnil)))) | |
3827 && (failed_reason = query_coding_unencodable))) | |
3828 && (previous_failed_reason == query_coding_succeeded | |
3829 || previous_failed_reason == failed_reason)) | |
3830 { | |
3831 pos++; | |
3832 INC_BYTEBPOS (buf, pos_byte); | |
3833 ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte); | |
3834 previous_failed_reason = failed_reason; | |
3835 } | |
3836 | |
3837 if (fail_range_start == pos) | |
3838 { | |
3839 /* The character can actually be encoded; move on. */ | |
3840 pos++; | |
3841 INC_BYTEBPOS (buf, pos_byte); | |
3842 } | |
3843 else | |
3844 { | |
3845 assert (previous_failed_reason == query_coding_invalid_sequence | |
3846 || previous_failed_reason == query_coding_unencodable); | |
3847 | |
3848 if (flags & QUERY_METHOD_ERRORP) | |
3849 { | |
3850 DECLARE_EISTRING (error_details); | |
3851 | |
3852 eicpy_ascii (error_details, "Cannot encode "); | |
3853 eicat_lstr (error_details, | |
3854 make_string_from_buffer (buf, fail_range_start, | |
3855 pos - fail_range_start)); | |
3856 eicat_ascii (error_details, " using coding system"); | |
3857 | |
3858 signal_error (Qtext_conversion_error, | |
3859 (const CIbyte *)(eidata (error_details)), | |
3860 XCODING_SYSTEM_NAME (codesys)); | |
3861 } | |
3862 | |
3863 if (NILP (result)) | |
3864 { | |
3865 result = Fmake_range_table (Qstart_closed_end_open); | |
3866 } | |
3867 | |
3868 fail_range_end = pos; | |
3869 | |
3870 Fput_range_table (make_int (fail_range_start), | |
3871 make_int (fail_range_end), | |
3872 (previous_failed_reason | |
3873 == query_coding_unencodable ? | |
3874 Qunencodable : Qinvalid_sequence), | |
3875 result); | |
3876 previous_failed_reason = query_coding_succeeded; | |
3877 | |
3878 if (flags & QUERY_METHOD_HIGHLIGHT) | |
3879 { | |
3880 Lisp_Object extent | |
3881 = Fmake_extent (make_int (fail_range_start), | |
3882 make_int (fail_range_end), | |
3883 wrap_buffer (buf)); | |
3884 | |
3885 Fset_extent_priority | |
3886 (extent, make_int (2 + mouse_highlight_priority)); | |
3887 Fset_extent_face (extent, Qquery_coding_warning_face); | |
3888 } | |
3889 } | |
3890 } | |
3891 } | |
3892 | |
3893 return result; | |
3894 } | |
3391 | 3895 |
3392 | 3896 |
3393 /************************************************************************/ | 3897 /************************************************************************/ |
3394 /* Initialization */ | 3898 /* Initialization */ |
3395 /************************************************************************/ | 3899 /************************************************************************/ |
3428 DEFSYMBOL (Qiso_7); | 3932 DEFSYMBOL (Qiso_7); |
3429 DEFSYMBOL (Qiso_8_designate); | 3933 DEFSYMBOL (Qiso_8_designate); |
3430 DEFSYMBOL (Qiso_8_1); | 3934 DEFSYMBOL (Qiso_8_1); |
3431 DEFSYMBOL (Qiso_8_2); | 3935 DEFSYMBOL (Qiso_8_2); |
3432 DEFSYMBOL (Qiso_lock_shift); | 3936 DEFSYMBOL (Qiso_lock_shift); |
3937 | |
3938 DEFSYMBOL (Qfrom_unicode); | |
3939 DEFSYMBOL (Qinvalid_sequences_skip_chars); | |
3940 DEFSYMBOL (Qquery_skip_chars); | |
3941 DEFSYMBOL (Qfixed_width); | |
3433 } | 3942 } |
3434 | 3943 |
3435 void | 3944 void |
3436 coding_system_type_create_mule_coding (void) | 3945 coding_system_type_create_mule_coding (void) |
3437 { | 3946 { |
3463 CODING_SYSTEM_HAS_METHOD (ccl, init_coding_stream); | 3972 CODING_SYSTEM_HAS_METHOD (ccl, init_coding_stream); |
3464 CODING_SYSTEM_HAS_METHOD (ccl, rewind_coding_stream); | 3973 CODING_SYSTEM_HAS_METHOD (ccl, rewind_coding_stream); |
3465 CODING_SYSTEM_HAS_METHOD (ccl, putprop); | 3974 CODING_SYSTEM_HAS_METHOD (ccl, putprop); |
3466 CODING_SYSTEM_HAS_METHOD (ccl, getprop); | 3975 CODING_SYSTEM_HAS_METHOD (ccl, getprop); |
3467 | 3976 |
3977 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (fixed_width, | |
3978 "fixed-width-coding-system-p"); | |
3979 CODING_SYSTEM_HAS_METHOD (fixed_width, mark); | |
3980 CODING_SYSTEM_HAS_METHOD (fixed_width, convert); | |
3981 CODING_SYSTEM_HAS_METHOD (fixed_width, query); | |
3982 CODING_SYSTEM_HAS_METHOD (fixed_width, init); | |
3983 CODING_SYSTEM_HAS_METHOD (fixed_width, init_coding_stream); | |
3984 CODING_SYSTEM_HAS_METHOD (fixed_width, rewind_coding_stream); | |
3985 CODING_SYSTEM_HAS_METHOD (fixed_width, putprop); | |
3986 CODING_SYSTEM_HAS_METHOD (fixed_width, getprop); | |
3987 | |
3468 INITIALIZE_CODING_SYSTEM_TYPE (shift_jis, "shift-jis-coding-system-p"); | 3988 INITIALIZE_CODING_SYSTEM_TYPE (shift_jis, "shift-jis-coding-system-p"); |
3469 CODING_SYSTEM_HAS_METHOD (shift_jis, convert); | 3989 CODING_SYSTEM_HAS_METHOD (shift_jis, convert); |
3470 | 3990 |
3471 INITIALIZE_DETECTOR (shift_jis); | 3991 INITIALIZE_DETECTOR (shift_jis); |
3472 DETECTOR_HAS_METHOD (shift_jis, detect); | 3992 DETECTOR_HAS_METHOD (shift_jis, detect); |
3483 void | 4003 void |
3484 reinit_coding_system_type_create_mule_coding (void) | 4004 reinit_coding_system_type_create_mule_coding (void) |
3485 { | 4005 { |
3486 REINITIALIZE_CODING_SYSTEM_TYPE (iso2022); | 4006 REINITIALIZE_CODING_SYSTEM_TYPE (iso2022); |
3487 REINITIALIZE_CODING_SYSTEM_TYPE (ccl); | 4007 REINITIALIZE_CODING_SYSTEM_TYPE (ccl); |
4008 REINITIALIZE_CODING_SYSTEM_TYPE (fixed_width); | |
3488 REINITIALIZE_CODING_SYSTEM_TYPE (shift_jis); | 4009 REINITIALIZE_CODING_SYSTEM_TYPE (shift_jis); |
3489 REINITIALIZE_CODING_SYSTEM_TYPE (big5); | 4010 REINITIALIZE_CODING_SYSTEM_TYPE (big5); |
3490 } | 4011 } |
3491 | 4012 |
3492 void | 4013 void |
3495 } | 4016 } |
3496 | 4017 |
3497 void | 4018 void |
3498 vars_of_mule_coding (void) | 4019 vars_of_mule_coding (void) |
3499 { | 4020 { |
3500 } | 4021 /* This needs to be HASH_TABLE_EQ, there's a corner case where |
4022 HASH_TABLE_EQUAL won't work. */ | |
4023 Vfixed_width_query_ranges_cache | |
4024 = make_lisp_hash_table (32, HASH_TABLE_KEY_WEAK, HASH_TABLE_EQ); | |
4025 staticpro (&Vfixed_width_query_ranges_cache); | |
4026 } |