annotate lisp/unicode.el @ 4981:4aebb0131297

Cleanups/renaming of EXTERNAL_TO_C_STRING and friends -------------------- ChangeLog entries follow: -------------------- modules/ChangeLog addition: 2010-02-05 Ben Wing <ben@xemacs.org> * postgresql/postgresql.c: * postgresql/postgresql.c (CHECK_LIVE_CONNECTION): * postgresql/postgresql.c (Fpq_connectdb): * postgresql/postgresql.c (Fpq_connect_start): * postgresql/postgresql.c (Fpq_lo_import): * postgresql/postgresql.c (Fpq_lo_export): * ldap/eldap.c (Fldap_open): * ldap/eldap.c (Fldap_search_basic): * ldap/eldap.c (Fldap_add): * ldap/eldap.c (Fldap_modify): * ldap/eldap.c (Fldap_delete): * canna/canna_api.c (Fcanna_initialize): * canna/canna_api.c (Fcanna_store_yomi): * canna/canna_api.c (Fcanna_parse): * canna/canna_api.c (Fcanna_henkan_begin): EXTERNAL_TO_C_STRING returns its argument instead of storing it in a parameter, and is renamed to EXTERNAL_TO_ITEXT. Similar things happen to related macros. See entry in src/ChangeLog. More Mule-izing of postgresql.c. Extract out common code between `pq-connectdb' and `pq-connect-start'. Fix places that signal an error string using a formatted string to instead follow the standard and have a fixed reason followed by the particular error message stored as one of the frobs. src/ChangeLog addition: 2010-02-05 Ben Wing <ben@xemacs.org> * console-msw.c (write_string_to_mswindows_debugging_output): * console-msw.c (Fmswindows_message_box): * console-x.c (x_perhaps_init_unseen_key_defaults): * console.c: * database.c (dbm_get): * database.c (dbm_put): * database.c (dbm_remove): * database.c (berkdb_get): * database.c (berkdb_put): * database.c (berkdb_remove): * database.c (Fopen_database): * device-gtk.c (gtk_init_device): * device-msw.c (msprinter_init_device_internal): * device-msw.c (msprinter_default_printer): * device-msw.c (msprinter_init_device): * device-msw.c (sync_printer_with_devmode): * device-msw.c (Fmsprinter_select_settings): * device-x.c (sanity_check_geometry_resource): * device-x.c (Dynarr_add_validified_lisp_string): * device-x.c (x_init_device): * device-x.c (Fx_put_resource): * device-x.c (Fx_valid_keysym_name_p): * device-x.c (Fx_set_font_path): * dialog-msw.c (push_lisp_string_as_unicode): * dialog-msw.c (handle_directory_dialog_box): * dialog-msw.c (handle_file_dialog_box): * dialog-x.c (dbox_descriptor_to_widget_value): * editfns.c (Fformat_time_string): * editfns.c (Fencode_time): * editfns.c (Fset_time_zone_rule): * emacs.c (make_argc_argv): * emacs.c (Fdump_emacs): * emodules.c (emodules_load): * eval.c: * eval.c (maybe_signal_error_1): * event-msw.c (Fdde_alloc_advise_item): * event-msw.c (mswindows_dde_callback): * event-msw.c (mswindows_wnd_proc): * fileio.c (report_error_with_errno): * fileio.c (Fsysnetunam): * fileio.c (Fdo_auto_save): * font-mgr.c (extract_fcapi_string): * font-mgr.c (Ffc_config_app_font_add_file): * font-mgr.c (Ffc_config_app_font_add_dir): * font-mgr.c (Ffc_config_filename): * frame-gtk.c (gtk_set_frame_text_value): * frame-gtk.c (gtk_create_widgets): * frame-msw.c (mswindows_init_frame_1): * frame-msw.c (mswindows_set_title_from_ibyte): * frame-msw.c (msprinter_init_frame_3): * frame-x.c (x_set_frame_text_value): * frame-x.c (x_set_frame_properties): * frame-x.c (start_drag_internal_1): * frame-x.c (x_cde_transfer_callback): * frame-x.c (x_create_widgets): * glyphs-eimage.c (my_jpeg_output_message): * glyphs-eimage.c (jpeg_instantiate): * glyphs-eimage.c (gif_instantiate): * glyphs-eimage.c (png_instantiate): * glyphs-eimage.c (tiff_instantiate): * glyphs-gtk.c (xbm_instantiate_1): * glyphs-gtk.c (gtk_xbm_instantiate): * glyphs-gtk.c (gtk_xpm_instantiate): * glyphs-gtk.c (gtk_xface_instantiate): * glyphs-gtk.c (cursor_font_instantiate): * glyphs-gtk.c (gtk_redisplay_widget): * glyphs-gtk.c (gtk_widget_instantiate_1): * glyphs-gtk.c (gtk_add_tab_item): * glyphs-msw.c (mswindows_xpm_instantiate): * glyphs-msw.c (bmp_instantiate): * glyphs-msw.c (mswindows_resource_instantiate): * glyphs-msw.c (xbm_instantiate_1): * glyphs-msw.c (mswindows_xbm_instantiate): * glyphs-msw.c (mswindows_xface_instantiate): * glyphs-msw.c (mswindows_redisplay_widget): * glyphs-msw.c (mswindows_widget_instantiate): * glyphs-msw.c (add_tree_item): * glyphs-msw.c (add_tab_item): * glyphs-msw.c (mswindows_combo_box_instantiate): * glyphs-msw.c (mswindows_widget_query_string_geometry): * glyphs-x.c (x_locate_pixmap_file): * glyphs-x.c (xbm_instantiate_1): * glyphs-x.c (x_xbm_instantiate): * glyphs-x.c (extract_xpm_color_names): * glyphs-x.c (x_xpm_instantiate): * glyphs-x.c (x_xface_instantiate): * glyphs-x.c (autodetect_instantiate): * glyphs-x.c (safe_XLoadFont): * glyphs-x.c (cursor_font_instantiate): * glyphs-x.c (x_redisplay_widget): * glyphs-x.c (Fchange_subwindow_property): * glyphs-x.c (x_widget_instantiate): * glyphs-x.c (x_tab_control_redisplay): * glyphs.c (pixmap_to_lisp_data): * gui-x.c (menu_separator_style_and_to_external): * gui-x.c (add_accel_and_to_external): * gui-x.c (button_item_to_widget_value): * hpplay.c (player_error_internal): * hpplay.c (play_sound_file): * hpplay.c (play_sound_data): * intl.c (Fset_current_locale): * lisp.h: * menubar-gtk.c (gtk_xemacs_set_accel_keys): * menubar-msw.c (populate_menu_add_item): * menubar-msw.c (populate_or_checksum_helper): * menubar-x.c (menu_item_descriptor_to_widget_value_1): * nt.c (init_user_info): * nt.c (get_long_basename): * nt.c (nt_get_resource): * nt.c (init_mswindows_environment): * nt.c (get_cached_volume_information): * nt.c (mswindows_readdir): * nt.c (read_unc_volume): * nt.c (mswindows_stat): * nt.c (mswindows_getdcwd): * nt.c (mswindows_executable_type): * nt.c (Fmswindows_short_file_name): * ntplay.c (nt_play_sound_file): * objects-gtk.c: * objects-gtk.c (gtk_valid_color_name_p): * objects-gtk.c (gtk_initialize_font_instance): * objects-gtk.c (gtk_font_list): * objects-msw.c (font_enum_callback_2): * objects-msw.c (parse_font_spec): * objects-x.c (x_parse_nearest_color): * objects-x.c (x_valid_color_name_p): * objects-x.c (x_initialize_font_instance): * objects-x.c (x_font_instance_truename): * objects-x.c (x_font_list): * objects-xlike-inc.c (XFUN): * objects-xlike-inc.c (xft_find_charset_font): * process-nt.c (mswindows_report_winsock_error): * process-nt.c (nt_create_process): * process-nt.c (get_internet_address): * process-nt.c (nt_open_network_stream): * process-unix.c: * process-unix.c (allocate_pty): * process-unix.c (get_internet_address): * process-unix.c (unix_canonicalize_host_name): * process-unix.c (unix_open_network_stream): * realpath.c: * select-common.h (lisp_data_to_selection_data): * select-gtk.c (symbol_to_gtk_atom): * select-gtk.c (atom_to_symbol): * select-msw.c (symbol_to_ms_cf): * select-msw.c (mswindows_register_selection_data_type): * select-x.c (symbol_to_x_atom): * select-x.c (x_atom_to_symbol): * select-x.c (hack_motif_clipboard_selection): * select-x.c (Fx_store_cutbuffer_internal): * sound.c (Fplay_sound_file): * sound.c (Fplay_sound): * sound.h (sound_perror): * sysdep.c: * sysdep.c (qxe_allocating_getcwd): * sysdep.c (qxe_execve): * sysdep.c (copy_in_passwd): * sysdep.c (qxe_getpwnam): * sysdep.c (qxe_ctime): * sysdll.c (dll_open): * sysdll.c (dll_function): * sysdll.c (dll_variable): * sysdll.c (search_linked_libs): * sysdll.c (dll_error): * sysfile.h: * sysfile.h (PATHNAME_CONVERT_OUT_TSTR): * sysfile.h (PATHNAME_CONVERT_OUT_UTF_8): * sysfile.h (PATHNAME_CONVERT_OUT): * sysfile.h (LISP_PATHNAME_CONVERT_OUT): * syswindows.h (ITEXT_TO_TSTR): * syswindows.h (LOCAL_FILE_FORMAT_TO_TSTR): * syswindows.h (TSTR_TO_LOCAL_FILE_FORMAT): * syswindows.h (LOCAL_FILE_FORMAT_TO_INTERNAL_MSWIN): * syswindows.h (LISP_LOCAL_FILE_FORMAT_MAYBE_URL_TO_TSTR): * text.h: * text.h (eicpy_ext_len): * text.h (enum new_dfc_src_type): * text.h (EXTERNAL_TO_ITEXT): * text.h (GET_STRERROR): * tooltalk.c (check_status): * tooltalk.c (Fadd_tooltalk_message_arg): * tooltalk.c (Fadd_tooltalk_pattern_attribute): * tooltalk.c (Fadd_tooltalk_pattern_arg): * win32.c (tstr_to_local_file_format): * win32.c (mswindows_lisp_error_1): * win32.c (mswindows_report_process_error): * win32.c (Fmswindows_shell_execute): * win32.c (mswindows_read_link_1): Changes involving external/internal format conversion, mostly code cleanup and renaming. 1. Eliminate the previous macros like LISP_STRING_TO_EXTERNAL that stored its result in a parameter. The new version of LISP_STRING_TO_EXTERNAL returns its result through the return value, same as the previous NEW_LISP_STRING_TO_EXTERNAL. Use the new-style macros throughout the code. 2. Rename C_STRING_TO_EXTERNAL and friends to ITEXT_TO_EXTERNAL, in keeping with overall naming rationalization involving Itext and related types. Macros involved in previous two: EXTERNAL_TO_C_STRING -> EXTERNAL_TO_ITEXT EXTERNAL_TO_C_STRING_MALLOC -> EXTERNAL_TO_ITEXT_MALLOC SIZED_EXTERNAL_TO_C_STRING -> SIZED_EXTERNAL_TO_ITEXT SIZED_EXTERNAL_TO_C_STRING_MALLOC -> SIZED_EXTERNAL_TO_ITEXT_MALLOC C_STRING_TO_EXTERNAL -> ITEXT_TO_EXTERNAL C_STRING_TO_EXTERNAL_MALLOC -> ITEXT_TO_EXTERNAL_MALLOC LISP_STRING_TO_EXTERNAL LISP_STRING_TO_EXTERNAL_MALLOC LISP_STRING_TO_TSTR C_STRING_TO_TSTR -> ITEXT_TO_TSTR TSTR_TO_C_STRING -> TSTR_TO_ITEXT The following four still return their values through parameters, since they have more than one value to return: C_STRING_TO_SIZED_EXTERNAL -> ITEXT_TO_SIZED_EXTERNAL LISP_STRING_TO_SIZED_EXTERNAL C_STRING_TO_SIZED_EXTERNAL_MALLOC -> ITEXT_TO_SIZED_EXTERNAL_MALLOC LISP_STRING_TO_SIZED_EXTERNAL_MALLOC Sometimes additional casts had to be inserted, since the old macros played strange games and completely defeated the type system of the store params. 3. Rewrite many places where direct calls to TO_EXTERNAL_FORMAT occurred with calls to one of the convenience macros listed above, or to make_extstring(). 4. Eliminate SIZED_C_STRING macros (they were hardly used, anyway) and use a direct call to TO_EXTERNAL_FORMAT or TO_INTERNAL_FORMAT. 4. Use LISP_PATHNAME_CONVERT_OUT in many places instead of something like LISP_STRING_TO_EXTERNAL(..., Qfile_name). 5. Eliminate some temporary variables that are no longer necessary now that we return a value rather than storing it into a variable. 6. Some Mule-izing in database.c. 7. Error functions: -- A bit of code cleanup in maybe_signal_error_1. -- Eliminate report_file_type_error; it's just an alias for signal_error_2 with params in a different order. -- Fix some places in the hostname-handling code that directly inserted externally-retrieved error strings into the supposed ASCII "reason" param instead of doing the right thing and sticking text descriptive of what was going on in "reason" and putting the external message in a frob. 8. Use Ascbyte instead of CIbyte in process-unix.c and maybe one or two other places. 9. Some code cleanup in copy_in_passwd() in sysdep.c. 10. Fix a real bug due to accidental variable shadowing in tstr_to_local_file_format() in win32.c.
author Ben Wing <ben@xemacs.org>
date Fri, 05 Feb 2010 11:02:24 -0600
parents b3ea9c582280
children c0934cef10c6
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 ;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*-
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
3 ;; Copyright (C) 2001, 2002 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5 ;; Keywords: multilingual, Unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 ;; This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 ;; XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10 ;; under the terms of the GNU General Public License as published by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 ;; the Free Software Foundation; either version 2, or (at your option)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ;; any later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 ;; XEmacs is distributed in the hope that it will be useful, but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15 ;; WITHOUT ANY WARRANTY; without even the implied warranty of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 ;; General Public License for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 ;; You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20 ;; along with XEmacs; see the file COPYING. If not, write to the Free
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 ;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22 ;; 02111-1307, USA.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24 ;;; Synched up with: Not in FSF.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26 ;;; Commentary:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 ;; Lisp support for Unicode, e.g. initialize the translation tables.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
30 ;;; Code:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
32 ;; GNU Emacs has the charsets:
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
33
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
34 ;; mule-unicode-2500-33ff
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
35 ;; mule-unicode-e000-ffff
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
36 ;; mule-unicode-0100-24ff
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
37
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
38 ;; built-in. This is hack--and an incomplete hack at that--against the
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
39 ;; spirit and the letter of standard ISO 2022 character sets. Instead of
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
40 ;; this, we have the jit-ucs-charset-N Mule character sets, created in
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
41 ;; unicode.c on encountering a Unicode code point that we don't recognise,
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
42 ;; and saved in ISO 2022 coding systems using the UTF-8 escape described in
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
43 ;; ISO-IR 196.
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
44
4083
a3f8bb07ab38 [xemacs-hg @ 2007-07-28 08:02:15 by aidan]
aidan
parents: 4072
diff changeset
45 (eval-when-compile (when (featurep 'mule) (require 'ccl)))
a3f8bb07ab38 [xemacs-hg @ 2007-07-28 08:02:15 by aidan]
aidan
parents: 4072
diff changeset
46
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
47 ;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
48 (defvar load-unicode-tables-at-dump-time (eq system-type 'windows-nt)
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
49 "[INTERNAL] Whether to load the Unicode tables at dump time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
50 Setting this at run-time does nothing.")
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
51
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
52 ;; NOTE: This takes only a fraction of a second on my Pentium III
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
53 ;; 700Mhz even with a totally optimization-disabled XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
54 (defun load-unicode-tables ()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
55 "Initialize the Unicode translation tables for all standard charsets."
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
56 (let ((parse-args
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
57 '(("unicode/unicode-consortium"
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
58 ;; Due to the braindamaged way Mule treats the ASCII and Control-1
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
59 ;; charsets' types, trying to load them results in out-of-range
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
60 ;; warnings at unicode.c:1439. They're no-ops anyway, they're
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
61 ;; hardwired in unicode.c (unicode_to_ichar, ichar_to_unicode).
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
62 ;; ("8859-1.TXT" ascii #x00 #x7F #x0)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
63 ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
64 ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
65 ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar.
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
66 ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
67 ;; "8859-10.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
68 ;; "8859-13.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
69 ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
70 ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80)
2575
e71117a6ddac [xemacs-hg @ 2005-02-09 15:29:07 by aidan]
aidan
parents: 2574
diff changeset
71 ("8859-16.TXT" latin-iso8859-16 #xA0 #xFF #x-80)
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
72 ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
73 ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
74 ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
75 ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80)
4784
a67bfb29dd8b Dump the arabic-iso8859-6 character set, again.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4783
diff changeset
76 ("8859-6.TXT" arabic-iso8859-6 #xA0 #xFF #x-80)
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
77 ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
78 ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
79 ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
80 ;; charset for Big5 does not matter; specifying `big5' will
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
81 ;; automatically make the right thing happen
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
82 ("BIG5.TXT" chinese-big5-1 nil nil nil big5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
83 ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
84 ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
85 ;; "CP1250.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
86 ;; "CP1251.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
87 ;; "CP1252.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
88 ;; "CP1253.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
89 ;; "CP1254.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
90 ;; "CP1255.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
91 ;; "CP1256.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
92 ;; "CP1257.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
93 ;; "CP1258.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
94 ;; "CP874.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
95 ;; "CP932.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
96 ;; "CP936.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
97 ;; "CP949.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
98 ;; "CP950.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
99 ;; "GB12345.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
100 ("GB2312.TXT" chinese-gb2312)
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
101 ;; "HANGUL.TXT"
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
102 ;; #### shouldn't JIS X 0201's upper limit be 7f?
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
103 ("JIS0201.TXT" latin-jisx0201 #x21 #x80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
104 ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
105 ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
106 ("JIS0212.TXT" japanese-jisx0212)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
107 ;; "JOHAB.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
108 ;; "KOI8-R.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
109 ;; "KSC5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
110 ;; note that KSC5601.TXT as currently distributed is NOT what
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
111 ;; it claims to be! see comments in KSX1001.TXT.
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
112 ("KSX1001.TXT" korean-ksc5601)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
113 ;; "OLD5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
114 ;; "SHIFTJIS.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
115 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
116 ("unicode/mule-ucs"
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
117 ;; #### we don't support surrogates?!??
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
118 ;; use these instead of the above ones once we support surrogates
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
119 ;;("chinese-cns11643-1.txt" chinese-cns11643-1)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
120 ;;("chinese-cns11643-2.txt" chinese-cns11643-2)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
121 ;;("chinese-cns11643-3.txt" chinese-cns11643-3)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
122 ;;("chinese-cns11643-4.txt" chinese-cns11643-4)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
123 ;;("chinese-cns11643-5.txt" chinese-cns11643-5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
124 ;;("chinese-cns11643-6.txt" chinese-cns11643-6)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
125 ;;("chinese-cns11643-7.txt" chinese-cns11643-7)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
126 ("chinese-sisheng.txt" chinese-sisheng)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
127 ("ethiopic.txt" ethiopic)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
128 ("indian-is13194.txt" indian-is13194)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
129 ("ipa.txt" ipa)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
130 ("thai-tis620.txt" thai-tis620)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
131 ("tibetan.txt" tibetan)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
132 ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
133 ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
134 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
135 ("unicode/other"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
136 ("lao.txt" lao)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
137 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
138 )))
4783
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
139 (mapc #'(lambda (tables)
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
140 (let ((undir
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
141 (expand-file-name (car tables) data-directory)))
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
142 (mapc #'(lambda (args)
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
143 (apply 'load-unicode-mapping-table
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
144 (expand-file-name (car args) undir)
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
145 (cdr args)))
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
146 (cdr tables))))
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
147 parse-args)
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
148 ;; The default-unicode-precedence-list. We set this here to default to
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
149 ;; *not* mapping various European characters to East Asian characters;
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
150 ;; otherwise the default-unicode-precedence-list is numerically ordered
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
151 ;; by charset ID.
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
152 (declare-fboundp
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
153 (set-default-unicode-precedence-list
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
154 '(ascii control-1 latin-iso8859-1 latin-iso8859-2 latin-iso8859-15
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
155 greek-iso8859-7 hebrew-iso8859-8 ipa cyrillic-iso8859-5
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
156 latin-iso8859-16 latin-iso8859-3 latin-iso8859-4 latin-iso8859-9
4805
980575c76541 Move the arabic-iso8859-6 character set back to C, otherwise X11 lookup fails.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4784
diff changeset
157 vietnamese-viscii-lower vietnamese-viscii-upper arabic-iso8859-6
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
158 jit-ucs-charset-0 japanese-jisx0208 japanese-jisx0208-1978
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
159 japanese-jisx0212 japanese-jisx0213-1 japanese-jisx0213-2
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
160 chinese-gb2312 chinese-sisheng chinese-big5-1 chinese-big5-2
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
161 indian-is13194 korean-ksc5601 chinese-cns11643-1 chinese-cns11643-2
4491
d402d7b18bd8 Revamp the Arabic support. Create greek-iso-8bit-with-esc, arabic-iso-8bit-with-esc.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4468
diff changeset
162 chinese-isoir165
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
163 composite ethiopic indian-1-column indian-2-column jit-ucs-charset-0
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
164 katakana-jisx0201 lao thai-tis620 thai-xtis tibetan tibetan-1-column
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
165 latin-jisx0201 chinese-cns11643-3 chinese-cns11643-4
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
166 chinese-cns11643-5 chinese-cns11643-6 chinese-cns11643-7)))))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
167
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
168 (defconst ccl-encode-to-ucs-2
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
169 (eval-when-compile
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
170 (let ((pre-existing
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
171 ;; This is the compiled CCL program from the assert
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
172 ;; below. Since this file is dumped and ccl.el isn't (and
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
173 ;; even when it was, it was dumped much later than this
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
174 ;; one), we can't compile the program at dump time. We can
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
175 ;; check at byte compile time that the program is as
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
176 ;; expected, though.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
177 [1 16 131127 7 98872 65823 1307 5 -65536 65313 64833 1028
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
178 147513 8 82009 255 22]))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
179 (when (featurep 'mule)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
180 ;; Check that the pre-existing constant reflects the intended
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
181 ;; CCL program.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
182 (assert
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
183 (equal pre-existing
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
184 (ccl-compile
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
185 `(1
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
186 ( ;; mule-to-unicode's first argument is the
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
187 ;; charset ID, the second its first byte
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
188 ;; left shifted by 7 bits masked with its
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
189 ;; second byte.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
190 (r1 = (r1 << 7))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
191 (r1 = (r1 | r2))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
192 (mule-to-unicode r0 r1)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
193 (if (r0 & ,(lognot #xFFFF))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
194 ;; Redisplay looks in r1 and r2 for the first
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
195 ;; and second bytes of the X11 font,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
196 ;; respectively. For non-BMP characters we
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
197 ;; display U+FFFD.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
198 ((r1 = #xFF)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
199 (r2 = #xFD))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
200 ((r1 = (r0 >> 8))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
201 (r2 = (r0 & #xFF))))))))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
202 nil
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
203 "The pre-compiled CCL program appears broken. "))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
204 pre-existing))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
205 "CCL program to transform Mule characters to UCS-2.")
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
206
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
207 (when (featurep 'mule)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
208 (put 'ccl-encode-to-ucs-2 'ccl-program-idx
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
209 (declare-fboundp
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
210 (register-ccl-program 'ccl-encode-to-ucs-2 ccl-encode-to-ucs-2))))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
211
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
212 (defun decode-char (quote-ucs code &optional restriction)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
213 "FSF compatibility--return Mule character with Unicode codepoint CODE.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
214 The second argument must be 'ucs, the third argument is ignored. "
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
215 ;; We're prepared to accept invalid Unicode in unicode-to-char, but not in
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
216 ;; this function, which is the API that should actually be used, since
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
217 ;; it's available in GNU and in Mule-UCS.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
218 (check-argument-range code #x0 #x10FFFF)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
219 (assert (eq quote-ucs 'ucs) t
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
220 "Sorry, decode-char doesn't yet support anything but the UCS. ")
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
221 (unicode-to-char code))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
222
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
223 (defun encode-char (char quote-ucs &optional restriction)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
224 "FSF compatibility--return the Unicode code point of CHAR.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
225 The second argument must be 'ucs, the third argument is ignored. "
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
226 (assert (eq quote-ucs 'ucs) t
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
227 "Sorry, encode-char doesn't yet support anything but the UCS. ")
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
228 (char-to-unicode char))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4619
diff changeset
229
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
230 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231 'utf-16 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232 "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 '(mnemonic "UTF-16"
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
234 documentation
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235 "UTF-16 Unicode encoding -- the standard (almost-) fixed-width
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 two-byte encoding, with surrogates. It will be fixed-width if all
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
237 characters are in the BMP (Basic Multilingual Plane -- first 65536
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238 codepoints). Cannot represent characters with codepoints above
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239 0x10FFFF (a little more than 1,000,000). Unicode and ISO guarantee
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240 never to encode any characters outside this range -- all the rest are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 for private, corporate or internal use."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
242 unicode-type utf-16))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243
2574
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
244 (define-coding-system-alias 'utf-16-be 'utf-16)
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
245
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247 'utf-16-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 "UTF-16 w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 '(mnemonic "UTF16-BOM"
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
250 documentation
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251 "UTF-16 Unicode encoding with byte order mark (BOM) at the beginning.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 The BOM is Unicode character U+FEFF -- i.e. the first two bytes are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253 0xFE and 0xFF, respectively, or reversed in a little-endian
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254 representation. It has been sanctioned by the Unicode Consortium for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255 use at the beginning of a Unicode stream as a marker of the byte order
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256 of the stream, and commonly appears in Unicode files under Microsoft
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
257 Windows, where it also functions as a magic cookie identifying a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
258 Unicode file. The character is called \"ZERO WIDTH NO-BREAK SPACE\"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
259 and is suitable as a byte-order marker because:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
261 -- it has no displayable representation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262 -- due to its semantics it never normally appears at the beginning
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
263 of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 -- its reverse U+FFFE is not a legal Unicode character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265 -- neither byte sequence is at all likely in any other standard
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266 encoding, particularly at the beginning of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
267
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
268 This coding system will insert a BOM at the beginning of a stream when
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
269 writing and strip it off when reading."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
270 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
271 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
272
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 'utf-16-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275 "UTF-16 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 '(mnemonic "UTF16-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278 "Little-endian version of UTF-16 Unicode encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279 See `utf-16' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
280 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282
2574
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
283 (define-coding-system-alias 'utf-16-le 'utf-16-little-endian)
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
284
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
285 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
286 'utf-16-little-endian-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287 "UTF-16 Little Endian w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
288 '(mnemonic "MSW-Unicode"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
289 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
290 "Little-endian version of UTF-16 Unicode encoding, with byte order mark.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
291 Standard encoding for representing Unicode under MS Windows. See
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
292 `utf-16-bom' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
293 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
294 little-endian t
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
295 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
297 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
298 'ucs-4 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
299 "UCS-4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
300 '(mnemonic "UCS4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
301 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
302 "UCS-4 Unicode encoding -- fully fixed-width four-byte encoding."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
303 unicode-type ucs-4))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 'ucs-4-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307 "UCS-4 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
308 '(mnemonic "UCS4-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
309 documentation
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
310 ;; #### I don't think this is permitted by ISO 10646, only Unicode.
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
311 ;; Call it UTF-32 instead?
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
312 "Little-endian version of UCS-4 Unicode encoding. See `ucs-4' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
313 unicode-type ucs-4
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 (make-coding-system
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
317 'utf-32 'unicode
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
318 "UTF-32"
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
319 '(mnemonic "UTF32"
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
320 documentation
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
321 "UTF-32 Unicode encoding -- fixed-width four-byte encoding,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
322 characters less than #x10FFFF are not supported. "
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
323 unicode-type utf-32))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
324
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
325 (make-coding-system
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
326 'utf-32-little-endian 'unicode
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
327 "UTF-32 Little Endian"
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
328 '(mnemonic "UTF32-LE"
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
329 documentation
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
330 "Little-endian version of UTF-32 Unicode encoding.
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
331
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
332 A fixed-width four-byte encoding, characters less than #x10FFFF are not
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
333 supported. "
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
334 unicode-type ucs-4 little-endian t))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 4083
diff changeset
335
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
336 ;; Now defined in unicode.c.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
337
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
338 ;;(make-coding-system
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
339 ;; 'utf-8 'unicode
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
340 ;; "UTF-8"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
341 ;; '(mnemonic "UTF8"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
342 ;; documentation "..."
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4805
diff changeset
343 ;; unicode-type utf-8))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
345 (make-coding-system
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
346 'utf-8-bom 'unicode
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
347 "UTF-8 w/BOM"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
348 '(mnemonic "MSW-UTF8"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
349 documentation
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
350 "UTF-8 Unicode encoding, with byte order mark.
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
351 Standard encoding for representing UTF-8 under MS Windows."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
352 unicode-type utf-8
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
353 little-endian t
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
354 need-bom t))
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
355
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
356 ;; Now, create jit-ucs-charset-0 entries for those characters in Windows
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
357 ;; Glyph List 4 that would otherwise end up in East Asian character sets.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
358 ;;
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
359 ;; WGL4 is a character repertoire from Microsoft that gives a guideline
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
360 ;; for font implementors as to what characters are sufficient for
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
361 ;; pan-European support. The intention of this code is to avoid the
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
362 ;; situation where these characters end up mapping to East Asian XEmacs
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
363 ;; characters, which generally clash strongly with European characters
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
364 ;; both in font choice and character width; jit-ucs-charset-0 is a
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
365 ;; single-width character set which comes before the East Asian character
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
366 ;; sets in the default-unicode-precedence-list above.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
367 (loop for (ucs ascii-or-latin-1)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
368 in '((#x2013 ?-) ;; U+2013 EN DASH
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
369 (#x2014 ?-) ;; U+2014 EM DASH
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
370 (#x2105 ?%) ;; U+2105 CARE OF
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
371 (#x203e ?-) ;; U+203E OVERLINE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
372 (#x221f ?|) ;; U+221F RIGHT ANGLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
373 (#x2584 ?|) ;; U+2584 LOWER HALF BLOCK
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
374 (#x2588 ?|) ;; U+2588 FULL BLOCK
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
375 (#x258c ?|) ;; U+258C LEFT HALF BLOCK
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
376 (#x2550 ?|) ;; U+2550 BOX DRAWINGS DOUBLE HORIZONTAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
377 (#x255e ?|) ;; U+255E BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
378 (#x256a ?|) ;; U+256A BOX DRAWINGS VERTICAL SINGLE & HORIZONTAL DOUBLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
379 (#x2561 ?|) ;; U+2561 BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
380 (#x2215 ?/) ;; U+2215 DIVISION SLASH
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
381 (#x02c9 ?`) ;; U+02C9 MODIFIER LETTER MACRON
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
382 (#x2211 ?s) ;; U+2211 N-ARY SUMMATION
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
383 (#x220f ?s) ;; U+220F N-ARY PRODUCT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
384 (#x2248 ?=) ;; U+2248 ALMOST EQUAL TO
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
385 (#x2264 ?=) ;; U+2264 LESS-THAN OR EQUAL TO
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
386 (#x2265 ?=) ;; U+2265 GREATER-THAN OR EQUAL TO
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
387 (#x201c ?') ;; U+201C LEFT DOUBLE QUOTATION MARK
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
388 (#x2026 ?.) ;; U+2026 HORIZONTAL ELLIPSIS
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
389 (#x2212 ?-) ;; U+2212 MINUS SIGN
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
390 (#x2260 ?=) ;; U+2260 NOT EQUAL TO
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
391 (#x221e ?=) ;; U+221E INFINITY
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
392 (#x2642 ?=) ;; U+2642 MALE SIGN
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
393 (#x2640 ?=) ;; U+2640 FEMALE SIGN
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
394 (#x2032 ?=) ;; U+2032 PRIME
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
395 (#x2033 ?=) ;; U+2033 DOUBLE PRIME
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
396 (#x25cb ?=) ;; U+25CB WHITE CIRCLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
397 (#x25cf ?=) ;; U+25CF BLACK CIRCLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
398 (#x25a1 ?=) ;; U+25A1 WHITE SQUARE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
399 (#x25a0 ?=) ;; U+25A0 BLACK SQUARE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
400 (#x25b2 ?=) ;; U+25B2 BLACK UP-POINTING TRIANGLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
401 (#x25bc ?=) ;; U+25BC BLACK DOWN-POINTING TRIANGLE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
402 (#x2192 ?=) ;; U+2192 RIGHTWARDS ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
403 (#x2190 ?=) ;; U+2190 LEFTWARDS ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
404 (#x2191 ?=) ;; U+2191 UPWARDS ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
405 (#x2193 ?=) ;; U+2193 DOWNWARDS ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
406 (#x2229 ?=) ;; U+2229 INTERSECTION
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
407 (#x2202 ?=) ;; U+2202 PARTIAL DIFFERENTIAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
408 (#x2261 ?=) ;; U+2261 IDENTICAL TO
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
409 (#x221a ?=) ;; U+221A SQUARE ROOT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
410 (#x222b ?=) ;; U+222B INTEGRAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
411 (#x2030 ?=) ;; U+2030 PER MILLE SIGN
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
412 (#x266a ?=) ;; U+266A EIGHTH NOTE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
413 (#x2020 ?*) ;; U+2020 DAGGER
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
414 (#x2021 ?*) ;; U+2021 DOUBLE DAGGER
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
415 (#x2500 ?|) ;; U+2500 BOX DRAWINGS LIGHT HORIZONTAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
416 (#x2502 ?|) ;; U+2502 BOX DRAWINGS LIGHT VERTICAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
417 (#x250c ?|) ;; U+250C BOX DRAWINGS LIGHT DOWN AND RIGHT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
418 (#x2510 ?|) ;; U+2510 BOX DRAWINGS LIGHT DOWN AND LEFT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
419 (#x2518 ?|) ;; U+2518 BOX DRAWINGS LIGHT UP AND LEFT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
420 (#x2514 ?|) ;; U+2514 BOX DRAWINGS LIGHT UP AND RIGHT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
421 (#x251c ?|) ;; U+251C BOX DRAWINGS LIGHT VERTICAL AND RIGHT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
422 (#x252c ?|) ;; U+252C BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
423 (#x2524 ?|) ;; U+2524 BOX DRAWINGS LIGHT VERTICAL AND LEFT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
424 (#x2534 ?|) ;; U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
425 (#x253c ?|) ;; U+253C BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
426 (#x02da ?^) ;; U+02DA RING ABOVE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
427 (#x2122 ?\xa9) ;; U+2122 TRADE MARK SIGN, ?,A)(B
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
428
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
429 (#x0132 ?\xe6) ;; U+0132 LATIN CAPITAL LIGATURE IJ, ?,Af(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
430 (#x013f ?\xe6) ;; U+013F LATIN CAPITAL LETTER L WITH MIDDLE DOT, ?,Af(B
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
431
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
432 (#x0133 ?\xe6) ;; U+0133 LATIN SMALL LIGATURE IJ, ?,Af(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
433 (#x0140 ?\xe6) ;; U+0140 LATIN SMALL LETTER L WITH MIDDLE DOT, ?,Af(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
434 (#x0149 ?\xe6) ;; U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPH,?,Af(B
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
435
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
436 (#x2194 ?|) ;; U+2194 LEFT RIGHT ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
437 (#x2660 ?*) ;; U+2660 BLACK SPADE SUIT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
438 (#x2665 ?*) ;; U+2665 BLACK HEART SUIT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
439 (#x2663 ?*) ;; U+2663 BLACK CLUB SUIT
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
440 (#x2592 ?|) ;; U+2592 MEDIUM SHADE
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
441 (#x2195 ?|) ;; U+2195 UP DOWN ARROW
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
442
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
443 (#x2113 ?\xb9) ;; U+2113 SCRIPT SMALL L, ?,A9(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
444 (#x215b ?\xbe) ;; U+215B VULGAR FRACTION ONE EIGHTH, ?,A>(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
445 (#x215c ?\xbe) ;; U+215C VULGAR FRACTION THREE EIGHTHS, ?,A>(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
446 (#x215d ?\xbe) ;; U+215D VULGAR FRACTION FIVE EIGHTHS, ?,A>(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
447 (#x215e ?\xbe) ;; U+215E VULGAR FRACTION SEVEN EIGHTHS, ?,A>(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
448 (#x207f ?\xbe) ;; U+207F SUPERSCRIPT LATIN SMALL LETTER N, ?,A>(B
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
449
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
450 ;; These are not in WGL 4, but are IPA characters that should not
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
451 ;; be double width. They are the only IPA characters that both
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
452 ;; occur in packages/mule-packages/leim/ipa.el and end up in East
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
453 ;; Asian character sets when that file is loaded in an XEmacs
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
454 ;; without packages.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
455 (#x2197 ?|) ;; U+2197 NORTH EAST ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
456 (#x2199 ?|) ;; U+2199 SOUTH WEST ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
457 (#x2191 ?|) ;; U+2191 UPWARDS ARROW
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
458 (#x207f ?\xb9)) ;; U+207F SUPERSCRIPT LATIN SMALL LETTER N, ?,A9(B
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
459 with decoded = nil
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
460 with syntax-table = (standard-syntax-table)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
461 initially (unless (featurep 'mule) (return))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
462 ;; This creates jit-ucs-charset-0 entries because:
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
463 ;;
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
464 ;; 1. If the tables are dumped, it is run at dump time before they are
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
465 ;; dumped, and as such before the relevant conversions are available
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
466 ;; (they are made available in mule/general-late.el).
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
467 ;;
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
468 ;; 2. If the tables are not dumped, it is run at dump time, long before
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
469 ;; any of the other mappings are available.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
470 ;;
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
471 do
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
472 (setq decoded (decode-char 'ucs ucs))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
473 (assert (eq (declare-fboundp (char-charset decoded))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
474 'jit-ucs-charset-0) nil
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
475 "Unexpected Unicode decoding behavior. ")
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
476 (modify-syntax-entry decoded
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
477 (string
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
478 (char-syntax ascii-or-latin-1))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
479 syntax-table))
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
480
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
481 ;; *Sigh*, declarations needs to be at the start of the line to be picked up
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
482 ;; by make-docfile. Not so much an issue with ccl-encode-to-ucs-2, which we
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
483 ;; don't necessarily want to advertise, but the following are important.
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
484
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
485 ;; Create all the Unicode error sequences, normally as jit-ucs-charset-0
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
486 ;; characters starting at U+200000 (which isn't a valid Unicode code
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
487 ;; point). Make them available to user code.
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
488 (defvar unicode-error-default-translation-table
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
489 (loop
4468
a78d697ccd2c Import and extend GNU's descr-text.el, supporting prefix argument for C-x =
Aidan Kehoe <kehoea@parhasard.net>
parents: 4317
diff changeset
490 with char-table = (make-char-table 'generic)
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
491 for i from ?\x00 to ?\xFF
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
492 initially (unless (featurep 'mule) (return))
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
493 do
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
494 (put-char-table (aref
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
495 ;; #xd800 is the first leading surrogate;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
496 ;; trailing surrogates must be in the range
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
497 ;; #xdc00-#xdfff. These examples are not, so we
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
498 ;; intentionally provoke an error sequence.
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
499 (decode-coding-string (format "\xd8\x00\x00%c" i)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
500 'utf-16-be)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
501 3)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
502 i
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
503 char-table)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
504 finally return char-table)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
505 "Translation table mapping Unicode error sequences to Latin-1 chars.
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
506
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
507 To transform XEmacs Unicode error sequences to the Latin-1 characters that
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
508 correspond to the octets on disk, you can use this variable. ")
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
509
4490
67fbcaf3dbdc error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents: 4489
diff changeset
510 (defvar unicode-invalid-sequence-regexp-range
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
511 (and (featurep 'mule)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
512 (format "%c%c-%c"
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
513 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 0)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
514 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
515 (aref (decode-coding-string "\xd8\x00\x00\xFF" 'utf-16-be) 3)))
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
516 "Regular expression range to match Unicode error sequences in XEmacs.
4145
edb00a8b4eff [xemacs-hg @ 2007-08-26 20:00:29 by aidan]
aidan
parents: 4096
diff changeset
517
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
518 Invalid Unicode sequences on input are represented as XEmacs
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
519 characters with values stored as the keys in
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
520 `unicode-error-default-translation-table', one character for each
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
521 invalid octet. You can use this variable (with `re-search-forward' or
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
522 `skip-chars-forward') to search for such characters; see also
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
523 `unicode-error-translate-region'. ")
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
524
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
525 ;; Check that the lookup table is correct, and that all the actual error
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
526 ;; sequences are caught by the regexp.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
527 (with-temp-buffer
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
528 (loop
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
529 for i from ?\x00 to ?\xFF
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
530 with to-check = (make-string 20 ?\x20)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
531 initially (unless (featurep 'mule) (return))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
532 do
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
533 (delete-region (point-min) (point-max))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
534 (insert to-check)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
535 (goto-char 10)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
536 (insert (decode-coding-string (format "\xd8\x00\x00%c" i)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
537 'utf-16-be))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
538 (backward-char)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
539 (assert (= i (get-char-table (char-after (point))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
540 unicode-error-default-translation-table))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
541 (format "Char ?\\x%x not the expected error sequence!"
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
542 i))
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
543
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
544 (goto-char (point-min))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
545 ;; Comment out until the issue in
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
546 ;; 18179.49815.622843.336527@parhasard.net is fixed.
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
547 (assert t ; (re-search-forward (concat "["
4490
67fbcaf3dbdc error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents: 4489
diff changeset
548 ; unicode-invalid-sequence-regexp-range
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
549 ; "]"))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
550 nil
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
551 (format "Could not find char ?\\x%x in buffer" i))))
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
552
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
553 (defun frob-unicode-errors-region (frob-function begin end &optional buffer)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
554 "Call FROB-FUNCTION on the Unicode error sequences between BEGIN and END.
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
555
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
556 Optional argument BUFFER specifies the buffer that should be examined for
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
557 such sequences. "
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
558 (check-argument-type #'functionp frob-function)
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
559 (check-argument-range begin (point-min buffer) (point-max buffer))
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4222
diff changeset
560 (check-argument-range end (point-min buffer) (point-max buffer))
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
561 (save-excursion
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
562 (save-restriction
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
563 (if buffer (set-buffer buffer))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
564 (narrow-to-region begin end)
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
565 (goto-char (point-min))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
566 (while end
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
567 (setq begin
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
568 (progn
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
569 (skip-chars-forward
4490
67fbcaf3dbdc error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents: 4489
diff changeset
570 (concat "^" unicode-invalid-sequence-regexp-range))
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
571 (point))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
572 end (and (not (= (point) (point-max)))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
573 (progn
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
574 (skip-chars-forward
4490
67fbcaf3dbdc error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents: 4489
diff changeset
575 unicode-invalid-sequence-regexp-range)
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
576 (point))))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
577 (if end
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
578 (funcall frob-function begin end))))))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
579
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
580 (defun unicode-error-translate-region (begin end &optional buffer table)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
581 "Translate the Unicode error sequences in BUFFER between BEGIN and END.
4202
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
582
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
583 The error sequences are transformed, by default, into the ASCII,
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
584 control-1 and latin-iso8859-1 characters with the numeric values
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
585 corresponding to the incorrect octets encountered. This is achieved
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
586 by using `unicode-error-default-translation-table' (which see) for
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
587 TABLE; you can change this by supplying another character table,
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
588 mapping from the error sequences to the desired characters. "
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
589 (unless table (setq table unicode-error-default-translation-table))
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
590 (frob-unicode-errors-region
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
591 (lambda (start finish)
a7c5de5b9880 [xemacs-hg @ 2007-10-02 10:33:04 by aidan]
aidan
parents: 4145
diff changeset
592 (translate-region start finish table))
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
593 begin end buffer))
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
594
4489
b75b075a9041 Support displaying invalid UTF-8 in language-environment-specific ways.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4468
diff changeset
595 ;; Sure would be nice to be able to use defface here.
4490
67fbcaf3dbdc error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents: 4489
diff changeset
596 (copy-face 'highlight 'unicode-invalid-sequence-warning-face)
4489
b75b075a9041 Support displaying invalid UTF-8 in language-environment-specific ways.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4468
diff changeset
597
4317
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
598 (unless (featurep 'mule)
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
599 ;; We do this in such a roundabout way--instead of having the above defun
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
600 ;; and defvar calls inside a (when (featurep 'mule) ...) form--to have
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
601 ;; make-docfile.c pick up symbol and function documentation correctly. An
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
602 ;; alternative approach would be to fix make-docfile.c to be able to read
15d36164ebd7 Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4268
diff changeset
603 ;; Lisp.
4783
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
604 (mapc #'unintern
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
605 '(ccl-encode-to-ucs-2 unicode-error-default-translation-table
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
606 unicode-invalid-regexp-range frob-unicode-errors-region
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
607 unicode-error-translate-region unicode-query-coding-region
e29fcfd8df5f Eliminate most core code byte-compile warnings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4690
diff changeset
608 unicode-query-coding-skip-chars-arg)))
3667
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
609
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
610 ;; #### UTF-7 is not yet implemented, and it's tricky to do. There's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
611 ;; an implementation in appendix A.1 of the Unicode Standard, Version
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
612 ;; 2.0, but I don't know its licensing characteristics.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
613
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
614 ; (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
615 ; 'utf-7 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
616 ; "UTF-7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
617 ; '(mnemonic "UTF7"
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
618 ; documentation; "UTF-7 Unicode encoding -- 7-bit-ASCII modal Internet-mail-compatible
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
619 ; encoding especially designed for headers, with the following
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
620 ; properties:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
621
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
622 ; -- Only characters that are considered safe for passing through any mail
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
623 ; gateway without damage are used.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
624
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
625 ; -- This is a modal encoding, with two states. The first, default
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
626 ; state encodes the most common Unicode characters (upper and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
627 ; lowercase letters, digits, and 9 common punctuation marks) as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
628 ; themselves, and the second state, entered using '+' and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
629 ; terminated with '-' or any character disallowed in state 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
630 ; encodes any Unicode characters by first converting to UTF-16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
631 ; most significant byte first, and then to a slightly modified
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
632 ; Base64 encoding. (Thus, UTF-7 has the same limitations on the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
633 ; characters it can encode as UTF-16.)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
634
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
635 ; -- The modified Base64 encoding deviates from standard Base64 in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
636 ; that it omits the `=' pad character. This is eliminated so as to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
637 ; avoid conflicts with the use of `=' as an escape in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
638 ; Quoted-Printable encoding and the related Q encoding for headers:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
639 ; With this modification, non-whitespace chars in UTF-7 will be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
640 ; represented in Quoted-Printable and in Q as-is, with no further
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
641 ; encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
642
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
643 ; For more information, see Appendix A.1 of The Unicode Standard 2.0, or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
644 ; wherever it is in v3.0."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
645 ; unicode-type utf-7))