annotate man/lispref/mule.texi @ 2367:ecf1ebac70d8

[xemacs-hg @ 2004-11-04 23:05:23 by ben] commit mega-patch configure.in: Turn off -Winline and -Wchar-subscripts. Use the right set of cflags when compiling modules. Rewrite ldap configuration to separate the inclusion of lber (needed in recent Cygwin) from the basic checks for the needed libraries. add a function for MAKE_JUNK_C; initially code was added to generate xemacs.def using this, but it will need to be rewritten. add an rm -f for junk.c to avoid weird Cygwin bug with cp -f onto an existing file. Sort list of auto-detected functions and eliminate unused checks for stpcpy, setlocale and getwd. Add autodetection of Cygwin scanf problems BETA: Rewrite section on configure to indicate what flags are important and what not. digest-doc.c, make-dump-id.c, profile.c, sorted-doc.c: Add proper decls for main(). make-msgfile.c: Document that this is old junk. Move proposal to text.c. make-msgfile.lex: Move proposal to text.c. make-mswin-unicode.pl: Convert error-generating code so that the entire message will be seen as a single unrecognized token. mule/mule-ccl.el: Update docs. lispref/mule.texi: Update CCL docs. ldap/eldap.c: Mule-ize. Use EXTERNAL_LIST_LOOP_2 instead of deleted EXTERNAL_LIST_LOOP. * XEmacs 21.5.18 "chestnut" is released. --------------------------------------------------------------- MULE-RELATED WORK: --------------------------------------------------------------- --------------------------- byte-char conversion --------------------------- buffer.c, buffer.h, insdel.c, text.c: Port FSF algorithm for byte-char conversion, replacing broken previous version. Track the char position of the gap. Add functions to do char-byte conversion downwards as well as upwards. Move comments about algorithm workings to internals manual. --------------------------- work on types --------------------------- alloc.c, console-x-impl.h, dump-data.c, dump-data.h, dumper.c, dialog-msw.c, dired-msw.c, doc.c, editfns.c, esd.c, event-gtk.h, event-msw.c, events.c, file-coding.c, file-coding.h, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-shared.c, glyphs-x.c, glyphs.c, glyphs.h, gui.c, hpplay.c, imgproc.c, intl-win32.c, lrecord.h, lstream.c, keymap.c, lisp.h, libsst.c, linuxplay.c, miscplay.c, miscplay.h, mule-coding.c, nas.c, nt.c, ntheap.c, ntplay.c, objects-msw.c, objects-tty.c, objects-x.c, print.c, process-nt.c, process.c, redisplay.h, select-common.h, select-gtk.c, select-x.c, sgiplay.c, sound.c, sound.h, sunplay.c, sysfile.h, sysdep.c, syswindows.h, text.c, unexnt.c, win32.c, xgccache.c: Further work on types. This creates a full set of types for all the basic semantics of `char' that I have so far identified, so that its semantics can always be identified for the purposes of proper Mule-safe code, and the raw use of `char' always avoided. (1) More type renaming, for consistency of naming. Char_ASCII -> Ascbyte UChar_ASCII -> UAscbyte Char_Binary -> CBinbyte UChar_Binary -> Binbyte SChar_Binary -> SBinbyte (2) Introduce Rawbyte, CRawbyte, Boolbyte, Chbyte, UChbyte, and Bitbyte and use them. (3) New types Itext, Wexttext and Textcount for separating out the concepts of bytes and textual units (different under UTF-16 and UTF-32, which are potential internal encodings). (4) qxestr*_c -> qxestr*_ascii. lisp.h: New; goes with other qxe() functions. #### Maybe goes in a different section. lisp.h: Group generic int-type defs together with EMACS_INT defs. lisp.h: * lisp.h (WEXTTEXT_IS_WIDE) New defns. lisp.h: New type to replace places where int occurs as a boolean. It's signed because occasionally people may want to use -1 as an error value, and because unsigned ints are viral -- see comments in the internals manual against using them. dynarr.c: int -> Bytecount. --------------------------- Mule-izing --------------------------- device-x.c: Partially Mule-ize. dumper.c, dumper.h: Mule-ize. Use Rawbyte. Use stderr_out not printf. Use wext_*(). sysdep.c, syswindows.h, text.c: New Wexttext API for manipulation of external text that may be Unicode (e.g. startup code under Windows). emacs.c: Mule-ize. Properly deal with argv in external encoding. Use wext_*() and Wexttext. Use Rawbyte. #if 0 some old junk on SCO that is unlikely to be correct. Rewrite allocation code in run-temacs. emacs.c, symsinit.h, win32.c: Rename win32 init function and call it even earlier, to initialize mswindows_9x_p even earlier, for use in startup code (XEUNICODE_P). process.c: Use _wenviron not environ under Windows, to get Unicode environment variables. event-Xt.c: Mule-ize drag-n-drop related stuff. dragdrop.c, dragdrop.h, frame-x.c: Mule-ize. text.h: Add some more stand-in defines for particular kinds of conversion; use in Mule-ization work in frame-x.c etc. --------------------------- Freshening --------------------------- intl-auto-encap-win32.c, intl-auto-encap-win32.h: Regenerate. --------------------------- Unicode-work --------------------------- intl-win32.c, syswindows.h: Factor out common options to MultiByteToWideChar and WideCharToMultiByte. Add convert_unicode_to_multibyte_malloc() and convert_unicode_to_multibyte_dynarr() and use. Add stuff for alloca() conversion of multibyte/unicode. alloc.c: Use dfc_external_data_len() in case of unicode coding system. alloc.c, mule-charset.c: Don't zero out and reinit charset Unicode tables. This fucks up dump-time loading. Anyway, either we load them at dump time or run time, never both. unicode.c: Dump the blank tables as well. --------------------------------------------------------------- DOCUMENTATION, MOSTLY MULE-RELATED: --------------------------------------------------------------- EmacsFrame.c, emodules.c, event-Xt.c, fileio.c, input-method-xlib.c, mule-wnnfns.c, redisplay-gtk.c, redisplay-tty.c, redisplay-x.c, regex.c, sysdep.c: Add comment about Mule work needed. text.h: Add more documentation describing why DFC routines were not written to return their value. Add some other DFC documentation. console-msw.c, console-msw.h: Add pointer to docs in win32.c. emacs.c: Add comments on sources of doc info. text.c, charset.h, unicode.c, intl-win32.c, intl-encap-win32.c, text.h, file-coding.c, mule-coding.c: Collect background comments and related to text matters and internationalization, and proposals for work to be done, in text.c or Internals manual, stuff related to specific textual API's in text.h, and stuff related to internal implementation of Unicode conversion in unicode.c. Put lots of pointers to the comments to make them easier to find. s/mingw32.h, s/win32-common.h, s/win32-native.h, s/windowsnt.h, win32.c: Add bunches of new documentation on the different kinds of builds and environments under Windows and how they work. Collect this info in win32.c. Add pointers to these docs in the relevant s/* files. emacs.c: Document places with long comments. Remove comment about exiting, move to internals manual, put in pointer. event-stream.c: Move docs about event queues and focus to internals manual, put in pointer. events.h: Move docs about event stream callbacks to internals manual, put in pointer. profile.c, redisplay.c, signal.c: Move documentation to the Internals manual. process-nt.c: Add pointer to comment in win32-native.el. lisp.h: Add comments about some comment conventions. lisp.h: Add comment about the second argument. device-msw.c, redisplay-msw.c: @@#### comments are out-of-date. --------------------------------------------------------------- PDUMP WORK (MOTIVATED BY UNICODE CHANGES) --------------------------------------------------------------- alloc.c, buffer.c, bytecode.c, console-impl.h, console.c, device.c, dumper.c, lrecord.h, elhash.c, emodules.h, events.c, extents.c, frame.c, glyphs.c, glyphs.h, mule-charset.c, mule-coding.c, objects.c, profile.c, rangetab.c, redisplay.c, specifier.c, specifier.h, window.c, lstream.c, file-coding.h, file-coding.c: PDUMP: Properly implement dump_add_root_block(), which never worked before, and is necessary for dumping Unicode tables. Pdump name changes for accuracy: XD_STRUCT_PTR -> XD_BLOCK_PTR. XD_STRUCT_ARRAY -> XD_BLOCK_ARRAY. XD_C_STRING -> XD_ASCII_STRING. *_structure_* -> *_block_*. lrecord.h: some comments added about dump_add_root_block() vs dump_add_root_block_ptr(). extents.c: remove incorrect comment about pdump problems with gap array. --------------------------------------------------------------- ALLOCATION --------------------------------------------------------------- abbrev.c, alloc.c, bytecode.c, casefiddle.c, device-msw.c, device-x.c, dired-msw.c, doc.c, doprnt.c, dragdrop.c, editfns.c, emodules.c, file-coding.c, fileio.c, filelock.c, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-x.c, gui-msw.c, gui-x.c, imgproc.c, intl-win32.c, lread.c, menubar-gtk.c, menubar.c, nt.c, objects-msw.c, objects-x.c, print.c, process-nt.c, process-unix.c, process.c, realpath.c, redisplay.c, search.c, select-common.c, symbols.c, sysdep.c, syswindows.h, text.c, text.h, ui-byhand.c: New macros {alloca,xnew}_{itext,{i,ext,raw,bin,asc}bytes} for more convenient allocation of these commonly requested items. Modify functions to use alloca_ibytes, alloca_array, alloca_extbytes, xnew_ibytes, etc. also XREALLOC_ARRAY, xnew. alloc.c: Rewrite the allocation functions to factor out repeated code. Add assertions for freeing dumped data. lisp.h: Moved down and consolidated with other allocation stuff. lisp.h, dynarr.c: New functions for allocation that's very efficient when mostly in LIFO order. lisp.h, text.c, text.h: Factor out some stuff for general use by alloca()-conversion funs. text.h, lisp.h: Fill out convenience routines for allocating various kinds of bytes and put them in lisp.h. Use them in place of xmalloc(), ALLOCA(). text.h: Fill out the convenience functions so the _MALLOC() kinds match the alloca() kinds. --------------------------------------------------------------- ERROR-CHECKING --------------------------------------------------------------- text.h: Create ASSERT_ASCTEXT_ASCII() and ASSERT_ASCTEXT_ASCII_LEN() from similar Eistring checkers and change the Eistring checkers to use them instead. --------------------------------------------------------------- MACROS IN LISP.H --------------------------------------------------------------- lisp.h: Redo GCPRO declarations. Create a "base" set of functions that can be used to generate any kind of gcpro sets -- regular, ngcpro, nngcpro, private ones used in GC_EXTERNAL_LIST_LOOP_2. buffer.c, callint.c, chartab.c, console-msw.c, device-x.c, dialog-msw.c, dired.c, extents.c, ui-gtk.c, rangetab.c, nt.c, mule-coding.c, minibuf.c, menubar-msw.c, menubar.c, menubar-gtk.c, lread.c, lisp.h, gutter.c, glyphs.c, glyphs-widget.c, fns.c, fileio.c, file-coding.c, specifier.c: Eliminate EXTERNAL_LIST_LOOP, which does not check for circularities. Use EXTERNAL_LIST_LOOP_2 instead or EXTERNAL_LIST_LOOP_3 or EXTERNAL_PROPERTY_LIST_LOOP_3 or GC_EXTERNAL_LIST_LOOP_2 (new macro). Removed/redid comments on EXTERNAL_LIST_LOOP. --------------------------------------------------------------- SPACING FIXES --------------------------------------------------------------- callint.c, hftctl.c, number-gmp.c, process-unix.c: Spacing fixes. --------------------------------------------------------------- FIX FOR GEOMETRY PROBLEM IN FIRST FRAME --------------------------------------------------------------- unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by release 1.5.12 of Cygwin]. toolbar.c: bug fix for problem of initial frame being 77 chars wide on Windows. will be overridden by my other ws. --------------------------------------------------------------- FIX FOR LEAKING PROCESS HANDLES: --------------------------------------------------------------- process-nt.c: Fixes for leaking handles. Inspired by work done by Adrian Aichner <adrian@xemacs.org>. --------------------------------------------------------------- FIX FOR CYGWIN BUG (Unicode-related): --------------------------------------------------------------- unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by release 1.5.12 of Cygwin]. --------------------------------------------------------------- WARNING FIXES: --------------------------------------------------------------- console-stream.c: `reinit' is unused. compiler.h, event-msw.c, frame-msw.c, intl-encap-win32.c, text.h: Add stuff to deal with ANSI-aliasing warnings I got. regex.c: Gather includes together to avoid warning. --------------------------------------------------------------- CHANGES TO INITIALIZATION ROUTINES: --------------------------------------------------------------- buffer.c, emacs.c, console.c, debug.c, device-x.c, device.c, dragdrop.c, emodules.c, eval.c, event-Xt.c, event-gtk.c, event-msw.c, event-stream.c, event-tty.c, events.c, extents.c, faces.c, file-coding.c, fileio.c, font-lock.c, frame-msw.c, glyphs-widget.c, glyphs.c, gui-x.c, insdel.c, lread.c, lstream.c, menubar-gtk.c, menubar-x.c, minibuf.c, mule-wnnfns.c, objects-msw.c, objects.c, print.c, scrollbar-x.c, search.c, select-x.c, text.c, undo.c, unicode.c, window.c, symsinit.h: Call reinit_*() functions directly from emacs.c, for clarity. Factor out some redundant init code. Move disallowed stuff that had crept into vars_of_glyphs() into complex_vars_of_glyphs(). Call init_eval_semi_early() from eval.c not in the middle of vars_of_() in emacs.c since there should be no order dependency in the latter calls. --------------------------------------------------------------- ARMAGEDDON: --------------------------------------------------------------- alloc.c, emacs.c, lisp.h, print.c: Rename inhibit_non_essential_printing_operations to inhibit_non_essential_conversion_operations. text.c: Assert on !inhibit_non_essential_conversion_operations. console-msw.c, print.c: Don't do conversion in SetConsoleTitle or FindWindow to avoid problems during armageddon. Put #errors for NON_ASCII_INTERNAL_FORMAT in places where problems would arise. --------------------------------------------------------------- CHANGES TO THE BUILD PROCEDURE: --------------------------------------------------------------- config.h.in, s/cxux.h, s/usg5-4-2.h, m/powerpc.h: Add comment about correct ordering of this file. Rearrange everything to follow this -- put all #undefs together and before the s&m files. Add undefs for HAVE_ALLOCA, C_ALLOCA, BROKEN_ALLOCA_IN_FUNCTION_CALLS, STACK_DIRECTION. Remove unused HAVE_STPCPY, HAVE_GETWD, HAVE_SETLOCALE. m/gec63.h: Deleted; totally broken, not used at all, not in FSF. m/7300.h, m/acorn.h, m/alliant-2800.h, m/alliant.h, m/altos.h, m/amdahl.h, m/apollo.h, m/att3b.h, m/aviion.h, m/celerity.h, m/clipper.h, m/cnvrgnt.h, m/convex.h, m/cydra5.h, m/delta.h, m/delta88k.h, m/dpx2.h, m/elxsi.h, m/ews4800r.h, m/gould.h, m/hp300bsd.h, m/hp800.h, m/hp9000s300.h, m/i860.h, m/ibmps2-aix.h, m/ibmrs6000.h, m/ibmrt-aix.h, m/ibmrt.h, m/intel386.h, m/iris4d.h, m/iris5d.h, m/iris6d.h, m/irist.h, m/isi-ov.h, m/luna88k.h, m/m68k.h, m/masscomp.h, m/mg1.h, m/mips-nec.h, m/mips-siemens.h, m/mips.h, m/news.h, m/nh3000.h, m/nh4000.h, m/ns32000.h, m/orion105.h, m/pfa50.h, m/plexus.h, m/pmax.h, m/powerpc.h, m/pyrmips.h, m/sequent-ptx.h, m/sequent.h, m/sgi-challenge.h, m/symmetry.h, m/tad68k.h, m/tahoe.h, m/targon31.h, m/tekxd88.h, m/template.h, m/tower32.h, m/tower32v3.h, m/ustation.h, m/vax.h, m/wicat.h, m/xps100.h: Delete C_ALLOCA, HAVE_ALLOCA, STACK_DIRECTION, BROKEN_ALLOCA_IN_FUNCTION_CALLS. All of this is auto-detected. When in doubt, I followed recent FSF sources, which also have these things deleted.
author ben
date Thu, 04 Nov 2004 23:08:28 +0000
parents d6d41d23b6ec
children a4040d921acc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1 @c -*-texinfo-*-
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2 @c This is part of the XEmacs Lisp Reference Manual.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
4 @c See the file lispref.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
5 @setfilename ../../info/internationalization.info
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
6 @node MULE, Tips, Internationalization, top
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
7 @chapter MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
8
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
9 @dfn{MULE} is the name originally given to the version of GNU Emacs
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
10 extended for multi-lingual (and in particular Asian-language) support.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
13 Japanese word for ``Japan''), which only provided support for Japanese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
15 it is based on @dfn{MULE}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
16
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
17 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
18 * Internationalization Terminology::
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
19 Definition of various internationalization terms.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
20 * Charsets:: Sets of related characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
21 * MULE Characters:: Working with characters in XEmacs/MULE.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
22 * Composite Characters:: Making new characters by overstriking other ones.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
23 * Coding Systems:: Ways of representing a string of chars using integers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
24 * CCL:: A special language for writing fast converters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
25 * Category Tables:: Subdividing charsets into groups.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
26 * Unicode Support:: The universal coded character set.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
27 * Charset Unification:: Handling overlapping character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
28 * Charsets and Coding Systems:: Tables and reference information.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
29 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
30
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
31 @node Internationalization Terminology, Charsets, , MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
32 @section Internationalization Terminology
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
33
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
34 In internationalization terminology, a string of text is divided up
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
35 into @dfn{characters}, which are the printable units that make up the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
36 text. A single character is (for example) a capital @samp{A}, the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
37 number @samp{2}, a Katakana character, a Hangul character, a Kanji
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
38 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
39 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
40 are thousands of such ideographs in each language), etc. The basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
41 property of a character is that it is the smallest unit of text with
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
42 semantic significance in text processing---i.e., characters are abstract
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
43 units defined by their meaning, not by their exact appearance.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
44
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
45 Human beings normally process text visually, so to a first approximation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
46 a character may be identified with its shape. Note that the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
47 character may be drawn by two different people (or in two different
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
48 fonts) in slightly different ways, although the "basic shape" will be the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
49 same. But consider the works of Scott Kim; human beings can recognize
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
50 hugely variant shapes as the "same" character. Sometimes, especially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
51 where characters are extremely complicated to write, completely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
52 different shapes may be defined as the "same" character in national
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
53 standards. The Taiwanese variant of Hanzi is generally the most
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
54 complicated; over the centuries, the Japanese, Koreans, and the People's
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
55 Republic of China have adopted simplifications of the shape, but the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
56 line of descent from the original shape is recorded, and the meanings
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
57 and pronunciation of different forms of the same character are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
58 considered to be identical within each language. (Of course, it may
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
59 take a specialist to recognize the related form; the point is that the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
60 relations are standardized, despite the differing shapes.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
61
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
62 In some cases, the differences will be significant enough that it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
63 actually possible to identify two or more distinct shapes that both
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
64 represent the same character. For example, the lowercase letters
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
65 @samp{a} and @samp{g} each have two distinct possible shapes---the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
66 @samp{a} can optionally have a curved tail projecting off the top, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
67 the @samp{g} can be formed either of two loops, or of one loop and a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
68 tail hanging off the bottom. Such distinct possible shapes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
69 character are called @dfn{glyphs}. The important characteristic of two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
70 glyphs making up the same character is that the choice between one or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
71 the other is purely stylistic and has no linguistic effect on a word
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
72 (this is the reason why a capital @samp{A} and lowercase @samp{a}
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
73 are different characters rather than different glyphs---e.g.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
74 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
75
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
76 Note that @dfn{character} and @dfn{glyph} are used differently
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
77 here than elsewhere in XEmacs.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
78
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
79 A @dfn{character set} is essentially a set of related characters. ASCII,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
80 for example, is a set of 94 characters (or 128, if you count
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
81 non-printing characters). Other character sets are ISO8859-1 (ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
82 plus various accented characters and other international symbols),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
83 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
84 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
85 GB2312 (Mainland Chinese Hanzi), etc.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
86
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
87 The definition of a character set will implicitly or explicitly give
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
88 it an @dfn{ordering}, a way of assigning a number to each character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
89 the set. For many character sets, there is a natural ordering, for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
90 example the ``ABC'' ordering of the Roman letters. But it is not clear
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
91 whether digits should come before or after the letters, and in fact
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
92 different European languages treat the ordering of accented characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
93 differently. It is useful to use the natural order where available, of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
94 course. The number assigned to any particular character is called the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
95 character's @dfn{code point}. (Within a given character set, each
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
96 character has a unique code point. Thus the word "set" is ill-chosen;
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
97 different orderings of the same characters are different character sets.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
98 Identifying characters is simple enough for alphabetic character sets,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
99 but the difference in ordering can cause great headaches when the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
100 thousands of characters are used by different cultures as in the Hanzi.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
101
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
102 It's important to understand that a character is defined not by any
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
103 number attached to it, but by its meaning. For example, ASCII and
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
104 EBCDIC are two charsets containing exactly the same characters
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
105 (lowercase and uppercase letters, numbers 0 through 9, particular
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
106 punctuation marks) but with different numberings. The @samp{comma}
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
107 character in ASCII and EBCDIC, for instance, is the same character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
108 despite having a different numbering. Conversely, when comparing ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
109 and JIS-Roman, which look the same except that the latter has a yen sign
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
110 substituted for the backslash, we would say that the backslash and yen
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
111 sign are @emph{not} the same characters, despite having the same number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
112 (95) and despite the fact that all other characters are present in both
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
113 charsets, with the same numbering. ASCII and JIS-Roman, then, do
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
114 @emph{not} have exactly the same characters in them (ASCII has a
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
115 backslash character but no yen-sign character, and vice-versa for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
116 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
117 and JIS-Roman are closer.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
118
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
119 Sometimes, a code point is not a single number, but instead a group of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
120 numbers, called @dfn{position codes}. In such cases, the number of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
121 position codes required to index a particular character in a character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
122 set is called the @dfn{dimension} of the character set. Character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
123 indexed by more than one position code typically use byte-sized position
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
124 codes. Small character sets, e.g. ASCII, invariably use a single
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
125 position code, but for larger character sets, the choice of whether to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
126 use multiple position codes or a single large (16-bit or 32-bit) number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
127 is arbitrary. Unicode typically uses a single large number, but
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
128 language-specific or "national" character sets often use multiple
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
129 (usually two) position codes. For example, JIS X 0208, i.e. Japanese
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
130 Kanji, has thousands of characters, and is of dimension two -- every
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
131 character is indexed by two position codes, each in the range 1 through
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
132 94. (This number ``94'' is not a coincidence; it is the same as the
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
133 number of printable characters in ASCII, and was chosen so that JIS
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
134 characters could be directly encoded using two printable ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
135 characters.) Note that the choice of the range here is somewhat
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
136 arbitrary -- it could just as easily be 0 through 93, 2 through 95, etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
137 In fact, the range for JIS position codes (and for other character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
138 modeled after it) is often given as range 33 through 126, so as to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
139 directly match ASCII printing characters.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
140
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
141 An @dfn{encoding} is a way of numerically representing characters from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
142 one or more character sets into a stream of like-sized numerical values
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
143 called @dfn{words} -- typically 8-bit bytes, but sometimes 16-bit or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
144 32-bit quantities. It's very important to clearly distinguish between
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
145 charsets and encodings. For a simple charset like ASCII, there is only
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
146 one encoding normally used -- each character is represented by a single
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
147 byte, with the same value as its code point. For more complicated
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
148 charsets, however, or when a single encoding needs to represent more
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
149 than charset, things are not so obvious. Unicode version 2, for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
150 example, is a large charset with thousands of characters, each indexed
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
151 by a 16-bit number, often represented in hex, e.g. 0x05D0 for the Hebrew
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
152 letter "aleph". One obvious encoding (actually two encodings, depending
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
153 on which of the two possible byte orderings is chosen) simply uses two
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
154 bytes per character. This encoding is convenient for internal
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
155 processing of Unicode text; however, it's incompatible with ASCII, and
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
156 thus external text (files, e-mail, etc.) that is encoded this way is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
157 completely uninterpretable by programs lacking Unicode support. For
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
158 this reason, a different, ASCII-compatible encoding, e.g. UTF-8, is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
159 usually used for external text. UTF-8 represents Unicode characters
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
160 with one to three bytes (often extended to six bytes to handle
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
161 characters with up to 31-bit indices). Unicode characters 00 to 7F
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
162 (identical with ASCII) are directly represented with one byte, and other
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
163 characters with two or more bytes, each in the range 80 to FF.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
164 Applications that don't understand Unicode will still be able to process
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
165 ASCII characters represented in UTF-8-encoded text, and will typically
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
166 ignore (and hopefully preserve) the high-bit characters.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
167
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
168 Naive use of code points is also not possible if more than one
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
169 character set is to be used in the encoding. For example, printed
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
170 Japanese text typically requires characters from multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
171 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
172 indexed using one or more position codes in the range 1 through 94 (or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
173 33 through 126), so the position codes could not be used directly or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
174 there would be no way to tell which character was meant. Different
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
175 Japanese encodings handle this differently -- JIS uses special escape
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
176 characters to denote different character sets; EUC sets the high bit of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
177 the position codes for JIS X 0208 and JIS X 0212, and puts a special
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
178 extra byte before each JIS X 0212 character; etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
179
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
180 The encodings described above are all 7-bit or 8-bit encodings. The
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
181 fixed-width Unicode encoding previous described, however, is sometimes
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
182 considered to be a 16-bit encoding, in which case the issue of byte
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
183 ordering does not come up. (Imagine, for example, that the text is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
184 represented as an array of shorts.) Similarly, Unicode version 3 (which
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
185 has characters with indices above 0xFFFF), and other very large
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
186 character sets, may be represented internally as 32-bit encodings,
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
187 i.e. arrays of ints. However, it does not make too much sense to talk
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
188 about 16-bit or 32-bit encodings for external data, since nowadays 8-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
189 data is a universal standard -- the closest you can get is fixed-width
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
190 encodings using two or four bytes to encode 16-bit or 32-bit values. (A
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
191 "7-bit" encoding is used when it cannot be guaranteed that the high bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
192 of 8-bit data will be correctly preserved. Some e-mail gateways, for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
193 example, strip the high bit of text passing through them. These same
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
194 gateways often handle non-printable characters incorrectly, and so 7-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
195 encodings usually avoid using bytes with such values.)
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
196
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
197 A general method of handling text using multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
198 (whether for multilingual text, or simply text in an extremely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
199 complicated single language like Japanese) is defined in the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
200 international standard ISO 2022. ISO 2022 will be discussed in more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
201 detail later (@pxref{ISO 2022}), but for now suffice it to say that text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
202 needs control functions (at least spacing), and if escape sequences are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
203 to be used, an escape sequence introducer. It was decided to make all
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
204 text streams compatible with ASCII in the sense that the codes 0--31
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
205 (and 128-159) would always be control codes, never graphic characters,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
206 and where defined by the character set the @samp{SPC} character would be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
207 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
208 94 code points remaining if 7 bits are used. This is the reason that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
209 most character sets are defined using position codes in the range 1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
210 through 94. Then ISO 2022 compatible encodings are produced by shifting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
211 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
212 codes are available) into character codes 161 to 254.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
213
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
214 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
215 a @dfn{modal encoding}, there are multiple states that the encoding can
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
216 be in, and the interpretation of the values in the stream depends on the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
217 current global state of the encoding. Special values in the encoding,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
218 called @dfn{escape sequences}, are used to change the global state.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
219 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
220 indicate that, from then on, bytes are to be interpreted as position
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
221 codes for JIS X 0208, rather than as ASCII. This effect is cancelled
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
222 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
223 current state is to ASCII''. To switch to JIS X 0212, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
224 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
225 sequences do in fact begin with @samp{ESC}. This is not necessarily the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
226 case, however. Some encodings use control characters called "locking
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
227 shifts" (effect persists until cancelled) to switch character sets.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
228
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
229 A @dfn{non-modal encoding} has no global state that extends past the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
230 character currently being interpreted. EUC, for example, is a
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
231 non-modal encoding. Characters in JIS X 0208 are encoded by setting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
232 the high bit of the position codes, and characters in JIS X 0212 are
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
233 encoded by doing the same but also prefixing the character with the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
234 byte 0x8F.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
235
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
236 The advantage of a modal encoding is that it is generally more
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
237 space-efficient, and is easily extendible because there are essentially
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
238 an arbitrary number of escape sequences that can be created. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
239 disadvantage, however, is that it is much more difficult to work with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
240 if it is not being processed in a sequential manner. In the non-modal
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
241 EUC encoding, for example, the byte 0x41 always refers to the letter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
242 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
243 one of the two position codes in a JIS X 0208 character, or one of the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
244 two position codes in a JIS X 0212 character. Determining exactly which
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
245 one is meant could be difficult and time-consuming if the previous
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
246 bytes in the string have not already been processed, or impossible if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
247 they are drawn from an external stream that cannot be rewound.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
248
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
249 Non-modal encodings are further divided into @dfn{fixed-width} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
250 @dfn{variable-width} formats. A fixed-width encoding always uses
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
251 the same number of words per character, whereas a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
252 encoding does not. EUC is a good example of a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
253 encoding: one to three bytes are used per character, depending on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
254 the character set. 16-bit and 32-bit encodings are nearly always
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
255 fixed-width, and this is in fact one of the main reasons for using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
256 an encoding with a larger word size. The advantages of fixed-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
257 encodings should be obvious. The advantages of variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
258 encodings are that they are generally more space-efficient and allow
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
259 for compatibility with existing 8-bit encodings such as ASCII. (For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
260 example, in Unicode ASCII characters are simply promoted to a 16-bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
261 representation. That means that every ASCII character contains a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
262 @samp{NUL} byte; evidently all of the standard string manipulation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
263 functions will lose badly in a fixed-width Unicode environment.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
264
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
265 The bytes in an 8-bit encoding are often referred to as @dfn{octets}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
266 rather than simply as bytes. This terminology dates back to the days
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
267 before 8-bit bytes were universal, when some computers had 9-bit bytes,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
268 others had 10-bit bytes, etc.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
269
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
270 @node Charsets, MULE Characters, Internationalization Terminology, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
271 @section Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
272
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
273 A @dfn{charset} in MULE is an object that encapsulates a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
274 particular character set as well as an ordering of those characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
275 Charsets are permanent objects and are named using symbols, like
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
276 faces.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
277
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
278 @defun charsetp object
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
279 This function returns non-@code{nil} if @var{object} is a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
280 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
281
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
282 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
283 * Charset Properties:: Properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
284 * Basic Charset Functions:: Functions for working with charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
285 * Charset Property Functions:: Functions for accessing charset properties.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
286 * Predefined Charsets:: Predefined charset objects.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
287 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
288
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
289 @node Charset Properties, Basic Charset Functions, , Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
290 @subsection Charset Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
291
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
292 Charsets have the following properties:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
293
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
294 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
295 @item name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
296 A symbol naming the charset. Every charset must have a different name;
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
297 this allows a charset to be referred to using its name rather than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
298 the actual charset object.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
299 @item doc-string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
300 A documentation string describing the charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
301 @item registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
302 A regular expression matching the font registry field for this character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
303 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
304 charsets use the registry @code{"ISO8859-1"}. This field is used to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
305 choose an appropriate font when the user gives a general font
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
306 specification such as @samp{-*-courier-medium-r-*-140-*}, i.e. a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
307 14-point upright medium-weight Courier font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
308 @item dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
309 Number of position codes used to index a character in the character set.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
310 XEmacs/MULE can only handle character sets of dimension 1 or 2.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
311 This property defaults to 1.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
312 @item chars
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
313 Number of characters in each dimension. In XEmacs/MULE, the only
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
314 allowed values are 94 or 96. (There are a couple of pre-defined
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
315 character sets, such as ASCII, that do not follow this, but you cannot
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
316 define new ones like this.) Defaults to 94. Note that if the dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
317 is 2, the character set thus described is 94x94 or 96x96.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
318 @item columns
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
319 Number of columns used to display a character in this charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
320 Only used in TTY mode. (Under X, the actual width of a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
321 can be derived from the font used to display the characters.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
322 If unspecified, defaults to the dimension. (This is almost
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
323 always the correct value, because character sets with dimension 2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
324 are usually ideograph character sets, which need two columns to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
325 display the intricate ideographs.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
326 @item direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
327 A symbol, either @code{l2r} (left-to-right) or @code{r2l}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
328 (right-to-left). Defaults to @code{l2r}. This specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
329 direction that the text should be displayed in, and will be
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
330 left-to-right for most charsets but right-to-left for Hebrew
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
331 and Arabic. (Right-to-left display is not currently implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
332 @item final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
333 Final byte of the standard ISO 2022 escape sequence designating this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
334 charset. Must be supplied. Each combination of (@var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
335 @var{chars}) defines a separate namespace for final bytes, and each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
336 charset within a particular namespace must have a different final byte.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
337 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
338 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
339 bytes in the range 0x30 - 0x3F are reserved for user-defined (not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
340 official) character sets. For more information on ISO 2022, see @ref{Coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
341 Systems}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
342 @item graphic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
343 0 (use left half of font on output) or 1 (use right half of font on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
344 output). Defaults to 0. This specifies how to convert the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
345 codes that index a character in a character set into an index into the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
346 font used to display the character set. With @code{graphic} set to 0,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
347 position codes 33 through 126 map to font indices 33 through 126; with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
348 it set to 1, position codes 33 through 126 map to font indices 161
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
349 through 254 (i.e. the same number but with the high bit set). For
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
350 example, for a font whose registry is ISO8859-1, the left half of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
351 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
352 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
353 @item ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
354 A compiled CCL program used to convert a character in this charset into
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
355 an index into the font. This is in addition to the @code{graphic}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
356 property. If a CCL program is defined, the position codes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
357 character will first be processed according to @code{graphic} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
358 then passed through the CCL program, with the resulting values used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
359 to index the font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
360
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
361 This is used, for example, in the Big5 character set (used in Taiwan).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
362 This character set is not ISO-2022-compliant, and its size (94x157) does
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
363 not fit within the maximum 96x96 size of ISO-2022-compliant character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
364 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
365 so as to group the most commonly used characters together) into two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
366 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
367 and each charset object uses a CCL program to convert the modified
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
368 position codes back into standard Big5 indices to retrieve a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
369 from a Big5 font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
370 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
371
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
372 Most of the above properties can only be set when the charset is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
373 initialized, and cannot be changed later.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
374 @xref{Charset Property Functions}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
375
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
376 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
377 @subsection Basic Charset Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
378
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
379 @defun find-charset charset-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
380 This function retrieves the charset of the given name. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
381 @var{charset-or-name} is a charset object, it is simply returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
382 Otherwise, @var{charset-or-name} should be a symbol. If there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
383 such charset, @code{nil} is returned. Otherwise the associated charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
384 object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
385 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
386
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
387 @defun get-charset name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
388 This function retrieves the charset of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
389 @code{find-charset} except an error is signalled if there is no such
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
390 charset instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
391 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
392
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
393 @defun charset-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
394 This function returns a list of the names of all defined charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
395 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
396
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
397 @defun make-charset name doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
398 This function defines a new character set. This function is for use
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
399 with MULE support. @var{name} is a symbol, the name by which the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
400 character set is normally referred. @var{doc-string} is a string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
401 describing the character set. @var{props} is a property list,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
402 describing the specific nature of the character set. The recognized
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
403 properties are @code{registry}, @code{dimension}, @code{columns},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
404 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
405 @code{ccl-program}, as previously described.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
406 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
407
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
408 @defun make-reverse-direction-charset charset new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
409 This function makes a charset equivalent to @var{charset} but which goes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
410 in the opposite direction. @var{new-name} is the name of the new
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
411 charset. The new charset is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
412 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
413
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
414 @defun charset-from-attributes dimension chars final &optional direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
415 This function returns a charset with the given @var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
416 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
417 omitted, both directions will be checked (left-to-right will be returned
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
418 if character sets exist for both directions).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
419 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
420
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
421 @defun charset-reverse-direction-charset charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
422 This function returns the charset (if any) with the same dimension,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
423 number of characters, and final byte as @var{charset}, but which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
424 displayed in the opposite direction.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
425 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
426
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
427 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
428 @subsection Charset Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
429
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
430 All of these functions accept either a charset name or charset object.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
431
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
432 @defun charset-property charset prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
433 This function returns property @var{prop} of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
434 @xref{Charset Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
436
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
437 Convenience functions are also provided for retrieving individual
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
438 properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
439
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
440 @defun charset-name charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
441 This function returns the name of @var{charset}. This will be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
442 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
443
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
444 @defun charset-description charset
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
445 This function returns the documentation string of @var{charset}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
446 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
447
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
448 @defun charset-registry charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
449 This function returns the registry of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
450 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
451
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
452 @defun charset-dimension charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
453 This function returns the dimension of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
454 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
455
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
456 @defun charset-chars charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
457 This function returns the number of characters per dimension of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
458 @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
459 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
460
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
461 @defun charset-width charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
462 This function returns the number of display columns per character (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
463 TTY mode) of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
464 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
465
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
466 @defun charset-direction charset
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
467 This function returns the display direction of @var{charset}---either
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
468 @code{l2r} or @code{r2l}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
469 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
470
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
471 @defun charset-iso-final-char charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
472 This function returns the final byte of the ISO 2022 escape sequence
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
473 designating @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
474 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
475
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
476 @defun charset-iso-graphic-plane charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
477 This function returns either 0 or 1, depending on whether the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
478 codes of characters in @var{charset} map to the left or right half
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
479 of their font, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
480 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
481
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
482 @defun charset-ccl-program charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
483 This function returns the CCL program, if any, for converting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
484 position codes of characters in @var{charset} into font indices.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
485 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
486
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
487 The two properties of a charset that can currently be set after the
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
488 charset has been created are the CCL program and the font registry.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
489
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
490 @defun set-charset-ccl-program charset ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
491 This function sets the @code{ccl-program} property of @var{charset} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
492 @var{ccl-program}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
493 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
494
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
495 @defun set-charset-registry charset registry
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
496 This function sets the @code{registry} property of @var{charset} to
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
497 @var{registry}.
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
498 @end defun
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
499
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
500 @node Predefined Charsets, , Charset Property Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
501 @subsection Predefined Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
502
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
503 The following charsets are predefined in the C code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
504
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
505 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
506 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
507 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
508 ascii 94 B 0 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
509 control-1 94 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
510 latin-iso8859-1 94 A 1 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
511 latin-iso8859-2 96 B 1 l2r ISO8859-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
512 latin-iso8859-3 96 C 1 l2r ISO8859-3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
513 latin-iso8859-4 96 D 1 l2r ISO8859-4
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
514 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
515 arabic-iso8859-6 96 G 1 r2l ISO8859-6
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
516 greek-iso8859-7 96 F 1 l2r ISO8859-7
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
517 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
518 latin-iso8859-9 96 M 1 l2r ISO8859-9
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
519 thai-tis620 96 T 1 l2r TIS620
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
520 katakana-jisx0201 94 I 1 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
521 latin-jisx0201 94 J 0 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
522 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
523 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
524 japanese-jisx0212 94x94 D 0 l2r JISX0212
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
525 chinese-gb2312 94x94 A 0 l2r GB2312
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
526 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
527 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
528 chinese-big5-1 94x94 0 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
529 chinese-big5-2 94x94 1 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
530 korean-ksc5601 94x94 C 0 l2r KSC5601
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
531 composite 96x96 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
532 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
533
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
534 The following charsets are predefined in the Lisp code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
535
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
536 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
537 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
538 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
539 arabic-digit 94 2 0 l2r MuleArabic-0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
540 arabic-1-column 94 3 0 r2l MuleArabic-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
541 arabic-2-column 94 4 0 r2l MuleArabic-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
542 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
543 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
544 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
545 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
546 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
547 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
548 ethiopic 94x94 2 0 l2r Ethio
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
549 ascii-r2l 94 B 0 r2l ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
550 ipa 96 0 1 l2r MuleIPA
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
551 vietnamese-viscii-lower 96 1 1 l2r VISCII1.1
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
552 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
553 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
554
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
555 For all of the above charsets, the dimension and number of columns are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
556 the same.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
557
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
558 Note that ASCII, Control-1, and Composite are handled specially.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
559 This is why some of the fields are blank; and some of the filled-in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
560 fields (e.g. the type) are not really accurate.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
561
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
562 @node MULE Characters, Composite Characters, Charsets, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
563 @section MULE Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
564
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
565 @defun make-char charset arg1 &optional arg2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
566 This function makes a multi-byte character from @var{charset} and octets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
567 @var{arg1} and @var{arg2}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
568 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
569
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
570 @defun char-charset character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
571 This function returns the character set of char @var{character}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
572 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
573
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
574 @defun char-octet character &optional n
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
575 This function returns the octet (i.e. position code) numbered @var{n}
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
576 (should be 0 or 1) of char @var{character}. @var{n} defaults to 0 if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
577 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
578
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
579 @defun find-charset-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
580 This function returns a list of the charsets in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
581 @var{start} and @var{end}. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
582 if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
583 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
584
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
585 @defun find-charset-string string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
586 This function returns a list of the charsets in @var{string}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
587 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
588
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
589 @node Composite Characters, Coding Systems, MULE Characters, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
590 @section Composite Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
591
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
592 Composite characters are not yet completely implemented.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
593
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
594 @defun make-composite-char string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
595 This function converts a string into a single composite character. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
596 character is the result of overstriking all the characters in the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
597 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
598 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
599
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
600 @defun composite-char-string character
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
601 This function returns a string of the characters comprising a composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
602 character.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
603 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
604
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
605 @defun compose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
606 This function composes the characters in the region from @var{start} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
607 @var{end} in @var{buffer} into one composite character. The composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
608 character replaces the composed characters. @var{buffer} defaults to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
609 the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
610 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
611
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
612 @defun decompose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
613 This function decomposes any composite characters in the region from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
614 @var{start} to @var{end} in @var{buffer}. This converts each composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
615 character into one or more characters, the individual characters out of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
616 which the composite character was formed. Non-composite characters are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
617 left as-is. @var{buffer} defaults to the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
618 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
619
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
620 @node Coding Systems, CCL, Composite Characters, MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
621 @section Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
622
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
623 A coding system is an object that defines how text containing multiple
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
624 character sets is encoded into a stream of (typically 8-bit) bytes. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
625 coding system is used to decode the stream into a series of characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
626 (which may be from multiple charsets) when the text is read from a file
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
627 or process, and is used to encode the text back into the same format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
628 when it is written out to a file or process.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
629
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
630 For example, many ISO-2022-compliant coding systems (such as Compound
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
631 Text, which is used for inter-client data under the X Window System) use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
632 escape sequences to switch between different charsets -- Japanese Kanji,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
633 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
634 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
635 @code{make-coding-system} for more information.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
636
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
637 Coding systems are normally identified using a symbol, and the symbol is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
638 accepted in place of the actual coding system object whenever a coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
639 system is called for. (This is similar to how faces and charsets work.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
640
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
641 @defun coding-system-p object
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
642 This function returns non-@code{nil} if @var{object} is a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
643 @end defun
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
644
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
645 @menu
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
646 * Coding System Types:: Classifying coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
647 * ISO 2022:: An international standard for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
648 charsets and encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
649 * EOL Conversion:: Dealing with different ways of denoting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
650 the end of a line.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
651 * Coding System Properties:: Properties of a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
652 * Basic Coding System Functions:: Working with coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
653 * Coding System Property Functions:: Retrieving a coding system's properties.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
654 * Encoding and Decoding Text:: Encoding and decoding text.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
655 * Detection of Textual Encoding:: Determining how text is encoded.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
656 * Big5 and Shift-JIS Functions:: Special functions for these non-standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
657 encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
658 * Predefined Coding Systems:: Coding systems implemented by MULE.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
659 @end menu
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
660
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
661 @node Coding System Types, ISO 2022, , Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
662 @subsection Coding System Types
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
663
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
664 The coding system type determines the basic algorithm XEmacs will use to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
665 decode or encode a data stream. Character encodings will be converted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
666 to the MULE encoding, escape sequences processed, and newline sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
667 converted to XEmacs's internal representation. There are three basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
668 classes of coding system type: no-conversion, ISO-2022, and special.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
669
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
670 No conversion allows you to look at the file's internal representation.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
671 Since XEmacs is basically a text editor, "no conversion" does convert
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
672 newline conventions by default. (Use the 'binary coding-system if this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
673 is not desired.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
674
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
675 ISO 2022 (@pxref{ISO 2022}) is the basic international standard regulating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
676 use of "coded character sets for the exchange of data", ie, text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
677 streams. ISO 2022 contains functions that make it possible to encode
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
678 text streams to comply with restrictions of the Internet mail system and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
679 de facto restrictions of most file systems (eg, use of the separator
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
680 character in file names). Coding systems which are not ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
681 conformant can be difficult to handle. Perhaps more important, they are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
682 not adaptable to multilingual information interchange, with the obvious
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
683 exception of ISO 10646 (Unicode). (Unicode is partially supported by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
684 XEmacs with the addition of the Lisp package ucs-conv.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
685
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
686 The special class of coding systems includes automatic detection, CCL (a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
687 "little language" embedded as an interpreter, useful for translating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
688 between variants of a single character set), non-ISO-2022-conformant
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
689 encodings like Unicode, Shift JIS, and Big5, and MULE internal coding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
690 (NB: this list is based on XEmacs 21.2. Terminology may vary slightly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
691 for other versions of XEmacs and for GNU Emacs 20.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
692
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
693 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
694 @item no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
695 No conversion, for binary files, and a few special cases of non-ISO-2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
696 coding systems where conversion is done by hook functions (usually
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
697 implemented in CCL). On output, graphic characters that are not in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
698 ASCII or Latin-1 will be replaced by a @samp{?}. (For a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
699 no-conversion-encoded buffer, these characters will only be present if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
700 you explicitly insert them.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
701 @item iso2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
702 Any ISO-2022-compliant encoding. Among others, this includes JIS (the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
703 Japanese encoding commonly used for e-mail), national variants of EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
704 (the standard Unix encoding for Japanese and other languages), and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
705 Compound Text (an encoding used in X11). You can specify more specific
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
706 information about the conversion with the @var{flags} argument.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
707 @item ucs-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
708 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
709 @item utf-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
710 ISO 10646 UTF-8 encoding. A ``file system safe'' transformation format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
711 that can be used with both UCS-4 and Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
712 @item undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
713 Automatic conversion. XEmacs attempts to detect the coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
714 in the file.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
715 @item shift-jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
716 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
717 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
718 Big5 (the encoding commonly used for Taiwanese).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
719 @item ccl
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
720 The conversion is performed using a user-written pseudo-code program.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
721 CCL (Code Conversion Language) is the name of this pseudo-code. For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
722 example, CCL is used to map KOI8-R characters (an encoding for Russian
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
723 Cyrillic) to ISO8859-5 (the form used internally by MULE).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
724 @item internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
725 Write out or read in the raw contents of the memory representing the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
726 buffer's text. This is primarily useful for debugging purposes, and is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
727 only enabled when XEmacs has been compiled with @code{DEBUG_XEMACS} set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
728 (the @samp{--debug} configure option). @strong{Warning}: Reading in a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
729 file using @code{internal} conversion can result in an internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
730 inconsistency in the memory representing a buffer's text, which will
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
731 produce unpredictable results and may cause XEmacs to crash. Under
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
732 normal circumstances you should never use @code{internal} conversion.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
733 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
734
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
735 @node ISO 2022, EOL Conversion, Coding System Types, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
736 @section ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
737
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
738 This section briefly describes the ISO 2022 encoding standard. A more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
739 thorough treatment is available in the original document of ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
740 2022 as well as various national standards (such as JIS X 0202).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
741
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
742 Character sets (@dfn{charsets}) are classified into the following four
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
743 categories, according to the number of characters in the charset:
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
744 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
745 that although an ISO 2022 coding system may have variable width
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
746 characters, each charset used is fixed-width (in contrast to the MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
747 character set and UTF-8, for example).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
748
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
749 ISO 2022 provides for switching between character sets via escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
750 sequences. This switching is somewhat complicated, because ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
751 provides for both legacy applications like Internet mail that accept
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
752 only 7 significant bits in some contexts (RFC 822 headers, for example),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
753 and more modern "8-bit clean" applications. It also provides for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
754 compact and transparent representation of languages like Japanese which
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
755 mix ASCII and a national script (even outside of computer programs).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
756
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
757 First, ISO 2022 codified prevailing practice by dividing the code space
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
758 into "control" and "graphic" regions. The code points 0x00-0x1F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
759 0x80-0x9F are reserved for "control characters", while "graphic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
760 characters" must be assigned to code points in the regions 0x20-0x7F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
761 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
762 circumstances must be assigned the graphic character "ASCII SPACE" and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
763 the control character "ASCII DEL" respectively.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
764
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
765 The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
766 C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
767 and "graphic right", respectively, because of the standard method of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
768 displaying graphic character sets in tables with the high byte indexing
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
769 columns and the low byte indexing rows. I don't find it very intuitive,
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
770 but these are called "registers".
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
771
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
772 An ISO 2022-conformant encoding for a graphic character set must use a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
773 fixed number of bytes per character, and the values must fit into a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
774 single register; that is, each byte must range over either 0x20-0x7F, or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
775 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
776 character set by using both ranges at the same. This is why a standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
777 character set such as ISO 8859-1 is actually considered by ISO 2022 to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
778 be an aggregation of two character sets, ASCII and LATIN-1, and why it
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
779 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
780 single character's bytes must all be drawn from the same register; this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
781 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
782 2022-compatible encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
783
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
784 The reason for this restriction becomes clear when you attempt to define
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
785 an efficient, robust encoding for a language like Japanese. Like ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
786 8859, Japanese encodings are aggregations of several character sets. In
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
787 practice, the vast majority of characters are drawn from the "JIS Roman"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
788 character set (a derivative of ASCII; it won't hurt to think of it as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
789 ASCII) and the JIS X 0208 standard "basic Japanese" character set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
790 including not only ideographic characters ("kanji") but syllabic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
791 Japanese characters ("kana"), a wide variety of symbols, and many
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
792 alphabetic characters (Roman, Greek, and Cyrillic) as well. Although
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
793 JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
794 suited to programming; thus the inclusion of ASCII in the standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
795 Japanese encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
796
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
797 For normal Japanese text such as in newspapers, a broad repertoire of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
798 approximately 3000 characters is used. Evidently this won't fit into
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
799 one byte; two must be used. But much of the text processed by Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
800 computers is computer source code, nearly all of which is ASCII. A not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
801 insignificant portion of ordinary text is English (as such or as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
802 borrowed Japanese vocabulary) or other languages which can represented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
803 at least approximately in ASCII, as well. It seems reasonable then to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
804 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
805 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
806 invoked to the GL register, and JIS X 0208 is invoked to the GR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
807 register. Thus, each byte can be tested for its character set by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
808 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
809 Furthermore, since control characters like newline can never be part of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
810 a graphic character, even in the case of corruption in transmission the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
811 stream will be resynchronized at every line break, on the order of 60-80
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
812 bytes. This coding system requires no escape sequences or special
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
813 control codes to represent 99.9% of all Japanese text.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
814
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
815 Note carefully the distinction between the character sets (ASCII and JIS
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
816 X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
817 JIS X 0208 character set is used in three different encodings for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
818 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
819 always clear), in EUC-JP it is invoked into GR (setting the high bit in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
820 the process), and in Shift JIS the high bit may be set or reset, and the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
821 significant bits are shifted within the 16-bit character so that the two
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
822 main character sets can coexist with a third (the "halfwidth katakana"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
823 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
824 version of the ISO-2022 coding system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
825
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
826 In order to systematically treat subsidiary character sets (like the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
827 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
828 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
829 Unlike GL and GR, they are not logically distinguished by internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
830 format. Instead, the process of "invocation" mentioned earlier is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
831 broken into two steps: first, a character set is @dfn{designated} to one
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
832 of the registers G0-G3 by use of an @dfn{escape sequence} of the form:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
833
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
834 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
835 ESC [@var{I}] @var{I} @var{F}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
836 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
837
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
838 where @var{I} is an intermediate character or characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
839 0x20 - 0x3F, and @var{F}, from the range 0x30-0x7Fm is the final
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
840 character identifying this charset. (Final characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
841 0x30-0x3F are reserved for private use and will never have a publicly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
842 registered meaning.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
843
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
844 Then that register is @dfn{invoked} to either GL or GR, either
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
845 automatically (designations to G0 normally involve invocation to GL as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
846 well), or by use of shifting (affecting only the following character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
847 the data stream) or locking (effective until the next designation or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
848 locking) control sequences. An encoding conformant to ISO 2022 is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
849 typically defined by designating the initial contents of the G0-G3
901
37e56e920ac5 [xemacs-hg @ 2002-07-05 20:35:47 by adrian]
adrian
parents: 775
diff changeset
850 registers, specifying a 7 or 8 bit environment, and specifying whether
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
851 further designations will be recognized.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
852
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
853 Some examples of character sets and the registered final characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
854 @var{F} used to designate them:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
855
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
856 @need 1000
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
857 @table @asis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
858 @item 94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
859 ASCII (B), left (J) and right (I) half of JIS X 0201, ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
860 @item 96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
861 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
862 @item 94x94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
863 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
864 @item 96x96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
865 none for the moment
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
866 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
867
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
868 The meanings of the various characters in these sequences, where not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
869 specified by the ISO 2022 standard (such as the ESC character), are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
870 assigned by @dfn{ECMA}, the European Computer Manufacturers Association.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
871
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
872 The meaning of intermediate characters are:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
873
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
874 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
875 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
876 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
877 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
878 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
879 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
880 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
881 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
882 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
883 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
884 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
885 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
886 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
887
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
888 The comma may be used in files read and written only by MULE, as a MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
889 extension, but this is illegal in ISO 2022. (The reason is that in ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
890 2022 G0 must be a 94-member character set, with 0x20 assigned the value
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
891 SPACE, and 0x7F assigned the value DEL.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
892
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
893 Here are examples of designations:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
894
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
895 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
896 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
897 ESC ( B : designate to G0 ASCII
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
898 ESC - A : designate to G1 Latin-1
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
899 ESC $ ( A or ESC $ A : designate to G0 GB2312
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
900 ESC $ ( B or ESC $ B : designate to G0 JISX0208
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
901 ESC $ ) C : designate to G1 KSC5601
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
902 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
903 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
904
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
905 (The short forms used to designate GB2312 and JIS X 0208 are for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
906 backwards compatibility; the long forms are preferred.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
907
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
908 To use a charset designated to G2 or G3, and to use a charset designated
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
909 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
910 into GL. There are two types of invocation, Locking Shift (forever) and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
911 Single Shift (one character only).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
912
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
913 Locking Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
914
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
915 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
916 LS0 or SI (0x0F): invoke G0 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
917 LS1 or SO (0x0E): invoke G1 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
918 LS2: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
919 LS3: invoke G3 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
920 LS1R: invoke G1 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
921 LS2R: invoke G2 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
922 LS3R: invoke G3 into GR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
923 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
924
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
925 Single Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
926
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
927 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
928 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
929 SS2 or ESC N: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
930 SS3 or ESC O: invoke G3 into GL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
931 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
932 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
933
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
934 The shift functions (such as LS1R and SS3) are represented by control
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
935 characters (from C1) in 8 bit environments and by escape sequences in 7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
936 bit environments.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
937
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
938 (#### Ben says: I think the above is slightly incorrect. It appears that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
939 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
940 ESC O behave as indicated. The above definitions will not parse
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
941 EUC-encoded text correctly, and it looks like the code in mule-coding.c
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
942 has similar problems.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
943
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
944 Evidently there are a lot of ISO-2022-compliant ways of encoding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
945 multilingual text. Now, in the world, there exist many coding systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
946 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
947 (Extended UNIX Code); all of these are variants of ISO 2022.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
948
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
949 In MULE, we characterize a version of ISO 2022 by the following
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
950 attributes:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
951
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
952 @enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
953 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
954 The character sets initially designated to G0 thru G3.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
955 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
956 Whether short form designations are allowed for Japanese and Chinese.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
957 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
958 Whether ASCII should be designated to G0 before control characters.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
959 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
960 Whether ASCII should be designated to G0 at the end of line.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
961 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
962 7-bit environment or 8-bit environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
963 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
964 Whether Locking Shifts are used or not.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
965 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
966 Whether to use ASCII or the variant JIS X 0201-1976-Roman.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
967 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
968 Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
969 @end enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
970
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
971 (The last two are only for Japanese.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
972
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
973 By specifying these attributes, you can create any variant
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
974 of ISO 2022.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
975
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
976 Here are several examples:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
977
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
978 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
979 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
980 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
981 1. G0 <- ASCII, G1..3 <- never used
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
982 2. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
983 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
984 4. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
985 5. 7-bit environment
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
986 6. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
987 7. Use ASCII
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
988 8. Use JIS X 0208-1983
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
989 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
990
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
991 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
992 ctext -- X11 Compound Text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
993 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
994 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
995 3. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
996 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
997 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
998 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
999 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1000 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1001 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1002
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1003 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1004 euc-china -- Chinese EUC. Often called the "GB encoding", but that is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1005 technically incorrect.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1006 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1007 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1008 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1009 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1010 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1011 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1012 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1013 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1014 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1015
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1016 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1017 ISO-2022-KR -- Coding system used in Korean email.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1018 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1019 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1020 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1021 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1022 5. 7-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1023 6. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1024 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1025 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1026 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1027 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1028
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1029 MULE creates all of these coding systems by default.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1030
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1031 @node EOL Conversion, Coding System Properties, ISO 2022, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1032 @subsection EOL Conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1033
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1034 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1035 @item nil
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1036 Automatically detect the end-of-line type (LF, CRLF, or CR). Also
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1037 generate subsidiary coding systems named @code{@var{name}-unix},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1038 @code{@var{name}-dos}, and @code{@var{name}-mac}, that are identical to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1039 this coding system but have an EOL-TYPE value of @code{lf}, @code{crlf},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1040 and @code{cr}, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1041 @item lf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1042 The end of a line is marked externally using ASCII LF. Since this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1043 also the way that XEmacs represents an end-of-line internally,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1044 specifying this option results in no end-of-line conversion. This is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1045 the standard format for Unix text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1046 @item crlf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1047 The end of a line is marked externally using ASCII CRLF. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1048 standard format for MS-DOS text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1049 @item cr
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1050 The end of a line is marked externally using ASCII CR. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1051 standard format for Macintosh text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1052 @item t
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1053 Automatically detect the end-of-line type but do not generate subsidiary
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1054 coding systems. (This value is converted to @code{nil} when stored
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1055 internally, and @code{coding-system-property} will return @code{nil}.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1056 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1057
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1058 @node Coding System Properties, Basic Coding System Functions, EOL Conversion, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1059 @subsection Coding System Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1060
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1061 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1062 @item mnemonic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1063 String to be displayed in the modeline when this coding system is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1064 active.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1065
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1066 @item eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1067 End-of-line conversion to be used. It should be one of the types
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1068 listed in @ref{EOL Conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1069
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1070 @item eol-lf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1071 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1072 Unix line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1073
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1074 @item eol-crlf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1075 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1076 DOS line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1077
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1078 @item eol-cr
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1079 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1080 Macintosh line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1081
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1082 @item post-read-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1083 Function called after a file has been read in, to perform the decoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1084 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1085 the current buffer to be decoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1086
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1087 @item pre-write-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1088 Function called before a file is written out, to perform the encoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1089 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1090 the current buffer to be encoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1091 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1092
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1093 The following additional properties are recognized if @var{type} is
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1094 @code{iso2022}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1095
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1096 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1097 @item charset-g0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1098 @itemx charset-g1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1099 @itemx charset-g2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1100 @itemx charset-g3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1101 The character set initially designated to the G0 - G3 registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1102 The value should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1103
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1104 @itemize @bullet
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1105 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1106 A charset object (designate that character set)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1107 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1108 @code{nil} (do not ever use this register)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1109 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1110 @code{t} (no character set is initially designated to the register, but
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1111 may be later on; this automatically sets the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1112 @code{force-g*-on-output} property)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1113 @end itemize
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1114
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1115 @item force-g0-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1116 @itemx force-g1-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1117 @itemx force-g2-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1118 @itemx force-g3-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1119 If non-@code{nil}, send an explicit designation sequence on output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1120 before using the specified register.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1121
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1122 @item short
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1123 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1124 and @samp{ESC $ B} on output in place of the full designation sequences
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1125 @samp{ESC $ ( @@}, @samp{ESC $ ( A}, and @samp{ESC $ ( B}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1126
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1127 @item no-ascii-eol
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1128 If non-@code{nil}, don't designate ASCII to G0 at each end of line on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1129 output. Setting this to non-@code{nil} also suppresses other
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1130 state-resetting that normally happens at the end of a line.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1131
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1132 @item no-ascii-cntl
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1133 If non-@code{nil}, don't designate ASCII to G0 before control chars on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1134 output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1135
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1136 @item seven
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1137 If non-@code{nil}, use 7-bit environment on output. Otherwise, use 8-bit
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1138 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1139
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1140 @item lock-shift
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1141 If non-@code{nil}, use locking-shift (SO/SI) instead of single-shift or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1142 designation by escape sequence.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1143
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1144 @item no-iso6429
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1145 If non-@code{nil}, don't use ISO6429's direction specification.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1146
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1147 @item escape-quoted
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1148 If non-@code{nil}, literal control characters that are the same as the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1149 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1150 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1151 and CSI (0x9B)) are ``quoted'' with an escape character so that they can
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1152 be properly distinguished from an escape sequence. (Note that doing
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1153 this results in a non-portable encoding.) This encoding flag is used for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1154 byte-compiled files. Note that ESC is a good choice for a quoting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1155 character because there are no escape sequences whose second byte is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1156 character from the Control-0 or Control-1 character sets; this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1157 explicitly disallowed by the ISO 2022 standard.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1158
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1159 @item input-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1160 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1161 in one charset to another when decoding is performed. Each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1162 specification is a list of two elements: the source charset, and the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1163 destination charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1164
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1165 @item output-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1166 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1167 in one charset to another when encoding is performed. The form of each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1168 specification is the same as for @code{input-charset-conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1169 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1170
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1171 The following additional properties are recognized (and required) if
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1172 @var{type} is @code{ccl}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1173
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1174 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1175 @item decode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1176 CCL program used for decoding (converting to internal format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1177
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1178 @item encode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1179 CCL program used for encoding (converting to external format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1180 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1181
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1182 The following properties are used internally: @var{eol-cr},
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1183 @var{eol-crlf}, @var{eol-lf}, and @var{base}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1184
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1185 @node Basic Coding System Functions, Coding System Property Functions, Coding System Properties, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1186 @subsection Basic Coding System Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1187
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1188 @defun find-coding-system coding-system-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1189 This function retrieves the coding system of the given name.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1190
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1191 If @var{coding-system-or-name} is a coding-system object, it is simply
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1192 returned. Otherwise, @var{coding-system-or-name} should be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1193 If there is no such coding system, @code{nil} is returned. Otherwise
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1194 the associated coding system object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1195 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1196
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1197 @defun get-coding-system name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1198 This function retrieves the coding system of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1199 @code{find-coding-system} except an error is signalled if there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1200 such coding system instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1201 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1202
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1203 @defun coding-system-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1204 This function returns a list of the names of all defined coding systems.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1205 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1206
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1207 @defun coding-system-name coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1208 This function returns the name of the given coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1209 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1210
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1211 @defun coding-system-base coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1212 Returns the base coding system (undecided EOL convention)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1213 coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1214 @end defun
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1215
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1216 @defun make-coding-system name type &optional doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1217 This function registers symbol @var{name} as a coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1218
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1219 @var{type} describes the conversion method used and should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1220 the types listed in @ref{Coding System Types}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1221
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1222 @var{doc-string} is a string describing the coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1223
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1224 @var{props} is a property list, describing the specific nature of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1225 character set. Recognized properties are as in @ref{Coding System
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1226 Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1227 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1228
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1229 @defun copy-coding-system old-coding-system new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1230 This function copies @var{old-coding-system} to @var{new-name}. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1231 @var{new-name} does not name an existing coding system, a new one will
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1232 be created.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1233 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1234
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1235 @defun subsidiary-coding-system coding-system eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1236 This function returns the subsidiary coding system of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1237 @var{coding-system} with eol type @var{eol-type}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1238 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1239
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1240 @node Coding System Property Functions, Encoding and Decoding Text, Basic Coding System Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1241 @subsection Coding System Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1242
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1243 @defun coding-system-doc-string coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1244 This function returns the doc string for @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1245 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1246
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1247 @defun coding-system-type coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1248 This function returns the type of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1249 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1250
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1251 @defun coding-system-property coding-system prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1252 This function returns the @var{prop} property of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1253 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1254
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1255 @node Encoding and Decoding Text, Detection of Textual Encoding, Coding System Property Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1256 @subsection Encoding and Decoding Text
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1257
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1258 @defun decode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1259 This function decodes the text between @var{start} and @var{end} which
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1260 is encoded in @var{coding-system}. This is useful if you've read in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1261 encoded text from a file without decoding it (e.g. you read in a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1262 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1263 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1264 encoded text is returned. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1265 if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1266 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1267
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1268 @defun encode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1269 This function encodes the text between @var{start} and @var{end} using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1270 @var{coding-system}. This will, for example, convert Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1271 characters into stuff such as @samp{^[$B!<!+^[(B} if you use the JIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1272 encoding. The length of the encoded text is returned. @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1273 defaults to the current buffer if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1274 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1275
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1276 @node Detection of Textual Encoding, Big5 and Shift-JIS Functions, Encoding and Decoding Text, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1277 @subsection Detection of Textual Encoding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1278
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1279 @defun coding-category-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1280 This function returns a list of all recognized coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1281 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1282
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1283 @defun set-coding-priority-list list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1284 This function changes the priority order of the coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1285 @var{list} should be a list of coding categories, in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1286 priority. Unspecified coding categories will be lower in priority than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1287 all specified ones, in the same relative order they were in previously.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1288 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1289
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1290 @defun coding-priority-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1291 This function returns a list of coding categories in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1292 priority.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1293 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1294
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1295 @defun set-coding-category-system coding-category coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1296 This function changes the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1297 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1298
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1299 @defun coding-category-system coding-category
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1300 This function returns the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1301 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1302
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1303 @defun detect-coding-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1304 This function detects coding system of the text in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1305 @var{start} and @var{end}. Returned value is a list of possible coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1306 systems ordered by priority. If only ASCII characters are found, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1307 returns @code{autodetect} or one of its subsidiary coding systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1308 according to a detected end-of-line type. Optional arg @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1309 defaults to the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1310 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1311
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1312 @node Big5 and Shift-JIS Functions, Predefined Coding Systems, Detection of Textual Encoding, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1313 @subsection Big5 and Shift-JIS Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1314
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1315 These are special functions for working with the non-standard
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1316 Shift-JIS and Big5 encodings.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1317
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1318 @defun decode-shift-jis-char code
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1319 This function decodes a JIS X 0208 character of Shift-JIS coding-system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1320 @var{code} is the character code in Shift-JIS as a cons of type bytes.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1321 The corresponding character is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1322 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1323
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1324 @defun encode-shift-jis-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1325 This function encodes a JIS X 0208 character @var{character} to
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1326 SHIFT-JIS coding-system. The corresponding character code in SHIFT-JIS
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1327 is returned as a cons of two bytes.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1328 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1329
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1330 @defun decode-big5-char code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1331 This function decodes a Big5 character @var{code} of BIG5 coding-system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1332 @var{code} is the character code in BIG5. The corresponding character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1333 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1334 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1335
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1336 @defun encode-big5-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1337 This function encodes the Big5 character @var{character} to BIG5
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1338 coding-system. The corresponding character code in Big5 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1339 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1340
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1341 @node Predefined Coding Systems, , Big5 and Shift-JIS Functions, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1342 @subsection Coding Systems Implemented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1343
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1344 MULE initializes most of the commonly used coding systems at XEmacs's
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1345 startup. A few others are initialized only when the relevant language
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1346 environment is selected and support libraries are loaded. (NB: The
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1347 following list is based on XEmacs 21.2.19, the development branch at the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1348 time of writing. The list may be somewhat different for other
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1349 versions. Recent versions of GNU Emacs 20 implement a few more rare
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1350 coding systems; work is being done to port these to XEmacs.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1351
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1352 Unfortunately, there is not a consistent naming convention for character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1353 sets, and for practical purposes coding systems often take their name
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1354 from their principal character sets (ASCII, KOI8-R, Shift JIS). Others
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1355 take their names from the coding system (ISO-2022-JP, EUC-KR), and a few
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1356 from their non-text usages (internal, binary). To provide for this, and
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1357 for the fact that many coding systems have several common names, an
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1358 aliasing system is provided. Finally, some effort has been made to use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1359 names that are registered as MIME charsets (this is why the name
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1360 'shift_jis contains that un-Lisp-y underscore).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1361
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1362 There is a systematic naming convention regarding end-of-line (EOL)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1363 conventions for different systems. A coding system whose name ends in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1364 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1365 A coding system whose name ends in "-mac" forces the assumptions that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1366 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1367 in "-dos" forces the assumptions that lines are broken by CRLF sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1368 (0x0D 0x0A). These subsidiary coding systems are automatically derived
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1369 from a base coding system. Use of the base coding system implies
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1370 autodetection of the text file convention. (The fact that the -unix,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1371 -mac, and -dos are derived from a base system results in them showing up
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1372 as "aliases" in `list-coding-systems'.) These subsidiaries have a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1373 consistent modeline indicator as well. "-dos" coding systems have ":T"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1374 appended to their modeline indicator, while "-mac" coding systems have
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1375 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1376
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1377 In the following table, each coding system is given with its mode line
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1378 indicator in parentheses. Non-textual coding systems are listed first,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1379 followed by textual coding systems and their aliases. (The coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1380 subsidiary modeline indicators ":T" and ":t" will be omitted from the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1381 table of coding systems.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1382
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1383 ### SJT 1999-08-23 Maybe should order these by language? Definitely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1384 need language usage for the ISO-8859 family.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1385
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1386 Note that although true coding system aliases have been implemented for
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1387 XEmacs 21.2, the coding system initialization has not yet been converted
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1388 as of 21.2.19. So coding systems described as aliases have the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1389 properties as the aliased coding system, but will not be equal as Lisp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1390 objects.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1391
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1392 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1393
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1394 @item automatic-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1395 @itemx undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1396 @itemx undecided-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1397 @itemx undecided-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1398 @itemx undecided-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1399
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1400 Modeline indicator: @code{Auto}. A type @code{undecided} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1401 Attempts to determine an appropriate coding system from file contents or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1402 the environment.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1403
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1404 @item raw-text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1405 @itemx no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1406 @itemx raw-text-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1407 @itemx raw-text-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1408 @itemx raw-text-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1409 @itemx no-conversion-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1410 @itemx no-conversion-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1411 @itemx no-conversion-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1412
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1413 Modeline indicator: @code{Raw}. A type @code{no-conversion} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1414 which converts only line-break-codes. An implementation quirk means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1415 that this coding system is also used for ISO8859-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1416
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1417 @item binary
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1418 Modeline indicator: @code{Binary}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1419 system which does no character coding or EOL conversions. An alias for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1420 @code{raw-text-unix}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1421
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1422 @item alternativnyj
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1423 @itemx alternativnyj-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1424 @itemx alternativnyj-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1425 @itemx alternativnyj-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1426
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1427 Modeline indicator: @code{Cy.Alt}. A type @code{ccl} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1428 Alternativnyj, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1429
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1430 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1431 @itemx big5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1432 @itemx big5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1433 @itemx big5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1434
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1435 Modeline indicator: @code{Zh/Big5}. A type @code{big5} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1436 BIG5, the most common encoding of traditional Chinese as used in Taiwan.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1437
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1438 @item cn-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1439 @itemx cn-gb-2312-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1440 @itemx cn-gb-2312-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1441 @itemx cn-gb-2312-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1443 Modeline indicator: @code{Zh-GB/EUC}. A type @code{iso2022} coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1444 for simplified Chinese (as used in the People's Republic of China), with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1445 the @code{ascii} (G0), @code{chinese-gb2312} (G1), and @code{sisheng}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1446 (G2) character sets initially designated. Chinese EUC (Extended Unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1447 Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1448
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1449 @item ctext-hebrew
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1450 @itemx ctext-hebrew-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1451 @itemx ctext-hebrew-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1452 @itemx ctext-hebrew-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1453
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1454 Modeline indicator: @code{CText/Hbrw}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1455 with the @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1456 sets initially designated for Hebrew.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1457
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1458 @item ctext
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1459 @itemx ctext-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1460 @itemx ctext-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1461 @itemx ctext-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1462
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1463 Modeline indicator: @code{CText}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1464 with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1465 sets initially designated. X11 Compound Text Encoding. Often
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1466 mistakenly recognized instead of EUC encodings; usual cause is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1467 inappropriate setting of @code{coding-priority-list}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1468
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1469 @item escape-quoted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1470
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1471 Modeline indicator: @code{ESC/Quot}. A type @code{iso2022} 8-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1472 system with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1473 character sets initially designated and escape quoting. Unix EOL
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1474 conversion (ie, no conversion). It is used for .ELC files.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1475
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1476 @item euc-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1477 @itemx euc-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1478 @itemx euc-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1479 @itemx euc-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1480
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1481 Modeline indicator: @code{Ja/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1482 with @code{ascii} (G0), @code{japanese-jisx0208} (G1),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1483 @code{katakana-jisx0201} (G2), and @code{japanese-jisx0212} (G3)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1484 initially designated. Japanese EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1485
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1486 @item euc-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1487 @itemx euc-kr-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1488 @itemx euc-kr-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1489 @itemx euc-kr-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1490
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1491 Modeline indicator: @code{ko/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1492 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1493 designated. Korean EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1494
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1495 @item hz-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1496 Modeline indicator: @code{Zh-GB/Hz}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1497 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1498 post-read-decode and pre-write-encode functions to translate the Hz/ZW
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1499 coding system used for Chinese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1500
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1501 @item iso-2022-7bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1502 @itemx iso-2022-7bit-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1503 @itemx iso-2022-7bit-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1504 @itemx iso-2022-7bit-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1505 @itemx iso-2022-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1506
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1507 Modeline indicator: @code{ISO7}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1508 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1509 be explicitly designated to be used.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1510
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1511 @item iso-2022-7bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1512 @itemx iso-2022-7bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1513 @itemx iso-2022-7bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1514 @itemx iso-2022-7bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1515
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1516 Modeline indicator: @code{ISO7/SS}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1517 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1518 be explicitly designated to be used. SS2 is used to invoke a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1519 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1520
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1521 @item iso-2022-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1522 @itemx iso-2022-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1523 @itemx iso-2022-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1524 @itemx iso-2022-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1525
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1526 Modeline indicator: @code{ISO8}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1527 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1528 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1529 used. No single-shift or locking-shift.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1530
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1531 @item iso-2022-8bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1532 @itemx iso-2022-8bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1533 @itemx iso-2022-8bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1534 @itemx iso-2022-8bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1535
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1536 Modeline indicator: @code{ISO8/SS}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1537 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1538 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1539 used. SS2 is used to invoke a 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1540
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1541 @item iso-2022-int-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1542 @itemx iso-2022-int-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1543 @itemx iso-2022-int-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1544 @itemx iso-2022-int-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1545
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1546 Modeline indicator: @code{INT-1}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1547 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1548 designated. ISO-2022-INT-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1549
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1550 @item iso-2022-jp-1978-irv
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1551 @itemx iso-2022-jp-1978-irv-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1552 @itemx iso-2022-jp-1978-irv-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1553 @itemx iso-2022-jp-1978-irv-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1554
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1555 Modeline indicator: @code{Ja-78/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1556 system. For compatibility with old Japanese terminals; if you need to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1557 know, look at the source.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1558
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1559 @item iso-2022-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1560 @itemx iso-2022-jp-2 (ISO7/SS)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1561 @itemx iso-2022-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1562 @itemx iso-2022-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1563 @itemx iso-2022-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1564 @itemx iso-2022-jp-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1565 @itemx iso-2022-jp-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1566 @itemx iso-2022-jp-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1567
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1568 Modeline indicator: @code{MULE/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1569 system with @code{ascii} (G0) initially designated, and complex
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1570 specifications to insure backward compatibility with old Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1571 systems. Used for communication with mail and news in Japan. The "-2"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1572 versions also use SS2 to invoke a 96-charset one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1573
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1574 @item iso-2022-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1575 Modeline indicator: @code{Ko/7bit} A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1576 system with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1577 designated. Used for e-mail in Korea.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1578
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1579 @item iso-2022-lock
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1580 @itemx iso-2022-lock-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1581 @itemx iso-2022-lock-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1582 @itemx iso-2022-lock-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1583
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1584 Modeline indicator: @code{ISO7/Lock}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1585 system with @code{ascii} (G0) initially designated, using Locking-Shift
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1586 to invoke a 96-charset.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1587
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1588 @item iso-8859-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1589 @itemx iso-8859-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1590 @itemx iso-8859-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1591 @itemx iso-8859-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1592
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1593 Due to implementation, this is not a type @code{iso2022} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1594 but rather an alias for the @code{raw-text} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1595
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1596 @item iso-8859-2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1597 @itemx iso-8859-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1598 @itemx iso-8859-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1599 @itemx iso-8859-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1600
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1601 Modeline indicator: @code{MIME/Ltn-2}. A type @code{iso2022} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1602 system with @code{ascii} (G0) and @code{latin-iso8859-2} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1603 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1604
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1605 @item iso-8859-3
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1606 @itemx iso-8859-3-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1607 @itemx iso-8859-3-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1608 @itemx iso-8859-3-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1609
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1610 Modeline indicator: @code{MIME/Ltn-3}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1611 with @code{ascii} (G0) and @code{latin-iso8859-3} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1612 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1613
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1614 @item iso-8859-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1615 @itemx iso-8859-4-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1616 @itemx iso-8859-4-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1617 @itemx iso-8859-4-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1618
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1619 Modeline indicator: @code{MIME/Ltn-4}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1620 with @code{ascii} (G0) and @code{latin-iso8859-4} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1621 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1622
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1623 @item iso-8859-5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1624 @itemx iso-8859-5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1625 @itemx iso-8859-5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1626 @itemx iso-8859-5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1627
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1628 Modeline indicator: @code{ISO8/Cyr}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1629 @code{ascii} (G0) and @code{cyrillic-iso8859-5} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1630
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1631 @item iso-8859-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1632 @itemx iso-8859-7-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1633 @itemx iso-8859-7-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1634 @itemx iso-8859-7-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1635
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1636 Modeline indicator: @code{Grk}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1637 @code{ascii} (G0) and @code{greek-iso8859-7} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1638
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1639 @item iso-8859-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1640 @itemx iso-8859-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1641 @itemx iso-8859-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1642 @itemx iso-8859-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1643
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1644 Modeline indicator: @code{MIME/Hbrw}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1645 @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1646
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1647 @item iso-8859-9
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1648 @itemx iso-8859-9-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1649 @itemx iso-8859-9-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1650 @itemx iso-8859-9-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1651
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1652 Modeline indicator: @code{MIME/Ltn-5}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1653 with @code{ascii} (G0) and @code{latin-iso8859-9} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1654 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1655
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1656 @item koi8-r
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1657 @itemx koi8-r-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1658 @itemx koi8-r-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1659 @itemx koi8-r-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1660
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1661 Modeline indicator: @code{KOI8}. A type @code{ccl} coding-system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1662 KOI8-R, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1663
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1664 @item shift_jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1665 @itemx shift_jis-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1666 @itemx shift_jis-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1667 @itemx shift_jis-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1668
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1669 Modeline indicator: @code{Ja/SJIS}. A type @code{shift-jis} coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1670 implementing the Shift-JIS encoding for Japanese. The underscore is to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1671 conform to the MIME charset implementing this encoding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1672
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1673 @item tis-620
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1674 @itemx tis-620-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1675 @itemx tis-620-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1676 @itemx tis-620-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1677
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1678 Modeline indicator: @code{TIS620}. A type @code{ccl} encoding for Thai. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1679 external encoding is defined by TIS620, the internal encoding is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1680 peculiar to MULE, and called @code{thai-xtis}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1681
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1682 @item viqr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1683
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1684 Modeline indicator: @code{VIQR}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1685 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1686 post-read-decode and pre-write-encode functions to translate the VIQR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1687 coding system for Vietnamese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1688
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1689 @item viscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1690 @itemx viscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1691 @itemx viscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1692 @itemx viscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1693
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1694 Modeline indicator: @code{VISCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1695 for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1696 given priority by XEmacs.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1697
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1698 @item vscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1699 @itemx vscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1700 @itemx vscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1701 @itemx vscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1702
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1703 Modeline indicator: @code{VSCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1704 for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1705 given priority by XEmacs. Use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1706 @code{(prefer-coding-system 'vietnamese-vscii)} to give priority to VSCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1707
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1708 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1709
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1710 @node CCL, Category Tables, Coding Systems, MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1711 @section CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1712
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1713 CCL (Code Conversion Language) is a simple structured programming
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1714 language designed for character coding conversions. A CCL program is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1715 compiled to CCL code (represented by a vector of integers) and executed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1716 by the CCL interpreter embedded in Emacs. The CCL interpreter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1717 implements a virtual machine with 8 registers called @code{r0}, ...,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1718 @code{r7}, a number of control structures, and some I/O operators. Take
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1719 care when using registers @code{r0} (used in implicit @dfn{set}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1720 statements) and especially @code{r7} (used internally by several
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1721 statements and operations, especially for multiple return values and I/O
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1722 operations).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1723
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1724 CCL is used for code conversion during process I/O and file I/O for
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1725 non-ISO2022 coding systems. (It is the only way for a user to specify a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1726 code conversion function.) It is also used for calculating the code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1727 point of an X11 font from a character code. However, since CCL is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1728 designed as a powerful programming language, it can be used for more
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1729 generic calculation where efficiency is demanded. A combination of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1730 three or more arithmetic operations can be calculated faster by CCL than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1731 by Emacs Lisp.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1732
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1733 @strong{Warning:} The code in @file{src/mule-ccl.c} and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1734 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1735 description of CCL's semantics. The previous version of this section
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1736 contained several typos and obsolete names left from earlier versions of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1737 MULE, and many may remain. (I am not an experienced CCL programmer; the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1738 few who know CCL well find writing English painful.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1739
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1740 A CCL program transforms an input data stream into an output data
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1741 stream. The input stream, held in a buffer of constant bytes, is left
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1742 unchanged. The buffer may be filled by an external input operation,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1743 taken from an Emacs buffer, or taken from a Lisp string. The output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1744 buffer is a dynamic array of bytes, which can be written by an external
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1745 output operation, inserted into an Emacs buffer, or returned as a Lisp
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1746 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1747
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1748 A CCL program is a (Lisp) list containing two or three members. The
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1749 first member is the @dfn{buffer magnification}, which indicates the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1750 required minimum size of the output buffer as a multiple of the input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1751 buffer. It is followed by the @dfn{main block} which executes while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1752 there is input remaining, and an optional @dfn{EOF block} which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1753 executed when the input is exhausted. Both the main block and the EOF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1754 block are CCL blocks.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1755
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1756 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1757 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1758 or an @dfn{assignment}, which is a list of a register to receive the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1759 assignment, an assignment operator, and an expression) or a @dfn{control
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1760 statement} (a list starting with a keyword, whose allowable syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1761 depends on the keyword).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1762
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1763 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1764 * CCL Syntax:: CCL program syntax in BNF notation.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1765 * CCL Statements:: Semantics of CCL statements.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1766 * CCL Expressions:: Operators and expressions in CCL.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1767 * Calling CCL:: Running CCL programs.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1768 * CCL Examples:: The encoding functions for Big5 and KOI-8.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1769 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1770
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1771 @node CCL Syntax, CCL Statements, , CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1772 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1773 @subsection CCL Syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1774
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1775 The full syntax of a CCL program in BNF notation:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1776
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1777 @format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1778 CCL_PROGRAM :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1779 (BUFFER_MAGNIFICATION
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1780 CCL_MAIN_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1781 [ CCL_EOF_BLOCK ])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1782
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1783 BUFFER_MAGNIFICATION := integer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1784 CCL_MAIN_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1785 CCL_EOF_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1786
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1787 CCL_BLOCK :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1788 STATEMENT | (STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1789 STATEMENT :=
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1790 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE | CALL
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1791 | TRANSLATE | MAP | END
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1792
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1793 SET :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1794 (REG = EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1795 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1796 | INT-OR-CHAR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1797
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1798 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1799
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1800 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1801 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1802 LOOP := (loop STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1803 BREAK := (break)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1804 REPEAT :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1805 (repeat)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1806 | (write-repeat [REG | INT-OR-CHAR | string])
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1807 | (write-read-repeat REG [INT-OR-CHAR | ARRAY])
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1808 READ :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1809 (read REG ...)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1810 | (read-if (REG OPERATOR ARG) CCL_BLOCK [CCL_BLOCK])
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1811 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1812 WRITE :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1813 (write REG ...)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1814 | (write EXPRESSION)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1815 | (write INT-OR-CHAR) | (write string) | (write REG ARRAY)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1816 | string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1817 CALL := (call ccl-program-name)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1818 END := (end)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1819
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1820 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1821 ARG := REG | INT-OR-CHAR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1822 OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1823 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1824 | < | > | == | <= | >= | != | de-sjis | en-sjis
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1825 ASSIGNMENT_OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1826 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1827 ARRAY := '[' INT-OR-CHAR ... ']'
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1828 INT-OR-CHAR := integer | character
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1829
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1830 @end format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1831
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1832 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1833 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1834 @subsection CCL Statements
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1835
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1836 The Emacs Code Conversion Language provides the following statement
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1837 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1838 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1839
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1840 @heading Set statement:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1841
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1842 The @dfn{set} statement has three variants with the syntaxes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1843 @samp{(@var{reg} = @var{expression})},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1844 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1845 @samp{@var{integer}}. The assignment operator variation of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1846 @dfn{set} statement works the same way as the corresponding C expression
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1847 statement does. The assignment operators are @code{+=}, @code{-=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1848 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1849 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1850 "naked integer" @var{integer} is equivalent to a @var{set} statement of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1851 the form @code{(r0 = @var{integer})}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1852
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1853 @heading I/O statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1854
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1855 The @dfn{read} statement takes one or more registers as arguments. It
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1856 reads one byte (a C char) from the input into each register in turn.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1857
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1858 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1859 ...)} it takes one or more registers as arguments and writes each in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1860 turn to the output. The integer in a register (interpreted as an
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1861 Ichar) is encoded to multibyte form (ie, Ibytes) and written to the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1862 current output buffer. If it is less than 256, it is written as is.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1863 The forms @samp{(write @var{expression})} and @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1864 @var{integer})} are treated analogously. The form @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1865 @var{string})} writes the constant string to the output. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1866 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1867 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1868 the @var{reg}th element of the @var{array} to the output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1869
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1870 @heading Conditional statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1871
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1872 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1873 an optional @var{second CCL block} as arguments. If the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1874 @var{expression} evaluates to non-zero, the first @var{CCL block} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1875 executed. Otherwise, if there is a @var{second CCL block}, it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1876 executed.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1877
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1878 The @dfn{read-if} variant of the @dfn{if} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1879 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1880 block} as arguments. The @var{expression} must have the form
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1881 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1882 a register or an integer). The @code{read-if} statement first reads
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1883 from the input into the first register operand in the @var{expression},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1884 then conditionally executes a CCL block just as the @code{if} statement
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1885 does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1886
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1887 The @dfn{branch} statement takes an @var{expression} and one or more CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1888 blocks as arguments. The CCL blocks are treated as a zero-indexed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1889 array, and the @code{branch} statement uses the @var{expression} as the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1890 index of the CCL block to execute. Null CCL blocks may be used as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1891 no-ops, continuing execution with the statement following the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1892 @code{branch} statement in the containing CCL block. Out-of-range
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1893 values for the @var{expression} are also treated as no-ops.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1894
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1895 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1896 @var{register}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1897 block} as arguments. The @code{read-branch} statement first reads from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1898 the input into the @var{register}, then conditionally executes a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1899 block just as the @code{branch} statement does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1900
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1901 @heading Loop control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1902
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1903 The @dfn{loop} statement creates a block with an implied jump from the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1904 end of the block back to its head. The loop is exited on a @code{break}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1905 statement, and continued without executing the tail by a @code{repeat}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1906 statement.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1907
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1908 The @dfn{break} statement, written @samp{(break)}, terminates the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1909 current loop and continues with the next statement in the current
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1910 block.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1911
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1912 The @dfn{repeat} statement has three variants, @code{repeat},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1913 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1914 current loop from its head, possibly after performing I/O.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1915 @code{repeat} takes no arguments and does no I/O before jumping.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1916 @code{write-repeat} takes a single argument (a register, an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1917 integer, or a string), writes it to the output, then jumps.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1918 @code{write-read-repeat} takes one or two arguments. The first must
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1919 be a register. The second may be an integer or an array; if absent, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1920 is implicitly set to the first (register) argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1921 @code{write-read-repeat} writes its second argument to the output, then
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1922 reads from the input into the register, and finally jumps. See the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1923 @code{write} and @code{read} statements for the semantics of the I/O
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1924 operations for each type of argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1925
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1926 @heading Other control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1927
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1928 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1929 executes a CCL program as a subroutine. It does not return a value to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1930 the caller, but can modify the register status.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1931
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1932 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1933 program successfully, and returns to caller (which may be a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1934 program). It does not alter the status of the registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1935
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1936 @node CCL Expressions, Calling CCL, CCL Statements, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1937 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1938 @subsection CCL Expressions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1939
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1940 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1941 consist of a single @var{operand}, either a register (one of @code{r0},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1942 ..., @code{r0}) or an integer. Complex expressions are lists of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1943 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1944 C, assignments are not expressions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1945
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1946 In the following table, @var{X} is the target resister for a @dfn{set}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1947 In subexpressions, this is implicitly @code{r7}. This means that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1948 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1949 freely in subexpressions, since they return parts of their values in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1950 @code{r7}. @var{Y} may be an expression, register, or integer, while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1951 @var{Z} must be a register or an integer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1952
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1953 @multitable @columnfractions .22 .14 .09 .55
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1954 @item Name @tab Operator @tab Code @tab C-like Description
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1955 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1956 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1957 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1958 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1959 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1960 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1961 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1962 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1963 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1964 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1965 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1966 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1967 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1968 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1969 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1970 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1971 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1972 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1973 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1974 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1975 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1976 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1977 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1978 @end multitable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1979
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1980 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1981 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1982 and CCL_DECODE_SJIS treat their first and second bytes as the high and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1983 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1984 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1985 complicated transformation of the Japanese standard JIS encoding to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1986 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1987 represent the SJIS operations in infix form.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1988
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1989 @node Calling CCL, CCL Examples, CCL Expressions, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1990 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1991 @subsection Calling CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1992
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1993 CCL programs are called automatically during Emacs buffer I/O when the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1994 external representation has a coding system type of @code{shift-jis},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1995 @code{big5}, or @code{ccl}. The program is specified by the coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1996 system (@pxref{Coding Systems}). You can also call CCL programs from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1997 other CCL programs, and from Lisp using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1998
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1999 @defun ccl-execute ccl-program status
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2000 Execute @var{ccl-program} with registers initialized by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2001 @var{status}. @var{ccl-program} is a vector of compiled CCL code
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2002 created by @code{ccl-compile}. It is an error for the program to try to
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2003 execute a CCL I/O command. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2004 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2005 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2006 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2007 the IC initializer causes execution to start at the beginning of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2008 program. When the program is done, @var{status} is modified (by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2009 side-effect) to contain the ending values for the corresponding
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2010 registers and IC.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2011 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2012
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2013 @defun ccl-execute-on-string ccl-program status string &optional continue
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2014 Execute @var{ccl-program} with initial @var{status} on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2015 @var{string}. @var{ccl-program} is a vector of compiled CCL code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2016 created by @code{ccl-compile}. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2017 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2018 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2019 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2020 the IC initializer causes execution to start at the beginning of the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2021 program. An optional fourth argument @var{continue}, if non-@code{nil}, causes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2022 the IC to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2023 remain on the unsatisfied read operation if the program terminates due
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2024 to exhaustion of the input buffer. Otherwise the IC is set to the end
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2025 of the program. When the program is done, @var{status} is modified (by
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2026 side-effect) to contain the ending values for the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2027 registers and IC. Returns the resulting string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2028 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2029
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2030 To call a CCL program from another CCL program, it must first be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2031 registered:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2032
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2033 @defun register-ccl-program name ccl-program
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2034 Register @var{name} for CCL program @var{ccl-program} in
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2035 @code{ccl-program-table}. @var{ccl-program} should be the compiled form of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2036 a CCL program, or @code{nil}. Return index number of the registered CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2037 program.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2038 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2039
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2040 Information about the processor time used by the CCL interpreter can be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2041 obtained using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2042
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2043 @defun ccl-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2044 Returns the elapsed processor time of the CCL interpreter as cons of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2045 user and system time, as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2046 floating point numbers measured in seconds. If only one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2047 overall value can be determined, the return value will be a cons of that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2048 value and 0.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2049 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2050
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2051 @defun ccl-reset-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2052 Resets the CCL interpreter's internal elapsed time registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2053 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2054
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2055 @node CCL Examples, , Calling CCL, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2056 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2057 @subsection CCL Examples
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2058
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2059 This section is not yet written.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2060
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2061 @node Category Tables, Unicode Support, CCL, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2062 @section Category Tables
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2063
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2064 A category table is a type of char table used for keeping track of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2065 categories. Categories are used for classifying characters for use in
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
2066 regexps---you can refer to a category rather than having to use a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2067 complicated [] expression (and category lookups are significantly
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2068 faster).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2069
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2070 There are 95 different categories available, one for each printable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2071 character (including space) in the ASCII charset. Each category is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2072 designated by one such character, called a @dfn{category designator}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2073 They are specified in a regexp using the syntax @samp{\cX}, where X is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2074 category designator. (This is not yet implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2075
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2076 A category table specifies, for each character, the categories that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2077 the character is in. Note that a character can be in more than one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2078 category. More specifically, a category table maps from a character to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2079 either the value @code{nil} (meaning the character is in no categories)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2080 or a 95-element bit vector, specifying for each of the 95 categories
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2081 whether the character is in that category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2082
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2083 Special Lisp functions are provided that abstract this, so you do not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2084 have to directly manipulate bit vectors.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2085
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2086 @defun category-table-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2087 This function returns @code{t} if @var{object} is a category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2088 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2089
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2090 @defun category-table &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2091 This function returns the current category table. This is the one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2092 specified by the current buffer, or by @var{buffer} if it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2093 non-@code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2094 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2095
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2096 @defun standard-category-table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2097 This function returns the standard category table. This is the one used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2098 for new buffers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2099 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2100
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2101 @defun copy-category-table &optional category-table
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2102 This function returns a new category table which is a copy of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2103 @var{category-table}, which defaults to the standard category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2104 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2105
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2106 @defun set-category-table category-table &optional buffer
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2107 This function selects @var{category-table} as the new category table for
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2108 @var{buffer}. @var{buffer} defaults to the current buffer if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2109 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2110
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2111 @defun category-designator-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2112 This function returns @code{t} if @var{object} is a category designator (a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2113 char in the range @samp{' '} to @samp{'~'}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2114 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2115
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2116 @defun category-table-value-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2117 This function returns @code{t} if @var{object} is a category table value.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2118 Valid values are @code{nil} or a bit vector of size 95.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2119 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2120
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2121
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2122 @c Added 2002-03-13 sjt
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2123 @node Unicode Support, Charset Unification, Category Tables, MULE
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2124 @section Unicode Support
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2125 @cindex unicode
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2126 @cindex utf-8
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2127 @cindex utf-16
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2128 @cindex ucs-2
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2129 @cindex ucs-4
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2130 @cindex bmp
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2131 @cindex basic multilingual plance
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2132
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2133 Unicode support was added by Ben Wing to XEmacs 21.5.6.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2134
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2135 @defun set-language-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2136 Set the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2137 This is a list of charsets, which are consulted in order for a translation
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2138 matching a given Unicode character. If no matches are found, the charsets
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2139 in the default precedence list (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2140 @code{set-default-unicode-precedence-list}) are consulted, and then all
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2141 remaining charsets, in some arbitrary order.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2142
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2143 The language-specific precedence list is meant to be set as part of the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2144 language environment initialization; the default precedence list is meant
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2145 to be set by the user.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2146 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2147
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2148 @defun language-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2149 Return the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2150 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2151 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2152
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2153 @defun set-default-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2154 Set the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2155 This is meant to be set by the user. See
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2156 `set-language-unicode-precedence-list' for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2157 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2158
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2159 @defun default-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2160 Return the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2161 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2162 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2163
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2164 @defun set-unicode-conversion character code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2165 Add conversion information between Unicode codepoints and characters.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2166 @var{character} is one of the following:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2167
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2168 @c #### fix this markup
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2169 -- A character (in which case @var{code} must be a non-negative integer)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2170 -- A vector of characters (in which case @var{code} must be a vector of
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2171 non-negative integers of the same length)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2172
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2173 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2174 private characters, but will cause errors when converted to UTF-16 or UTF-32.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2175 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2176 out at 2^30 - 1.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2177 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2178
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2179 @defun character-to-unicode character
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2180 Convert @var{character} to Unicode codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2181 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2182 this function simply does @code{char-to-int}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2183 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2184
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2185 @defun unicode-to-character code [charsets]
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2186 Convert Unicode codepoint @var{code} to character.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2187 @var{code} should be a non-negative integer.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2188 If @var{charsets} is given, it should be a list of charsets, and only those
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2189 charsets will be consulted, in the given order, for a translation.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2190 Otherwise, the default ordering of all charsets will be given (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2191 @code{set-unicode-charset-precedence}).
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2192
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2193 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2194 this function simply does @code{int-to-char} and ignores the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2195 @var{charsets} argument.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2196 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2197
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2198 @defun parse-unicode-translation-table filename charset start end offset flags
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2199 Parse Unicode translation data in @var{filename} for MULE @var{charset}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2200 Data is text, in the form of one translation per line -- charset
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2201 codepoint followed by Unicode codepoint. Numbers are decimal or hex
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2202 \(preceded by 0x). Comments are marked with a #. Charset codepoints
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2203 for two-dimensional charsets should have the first octet stored in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2204 high 8 bits of the hex number and the second in the low 8 bits.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2205
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2206 If @var{start} and @var{end} are given, only charset codepoints within
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2207 the given range will be processed. If @var{offset} is given, that value
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2208 will be added to all charset codepoints in the file to obtain the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2209 internal charset codepoint. @var{start} and @var{end} apply to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2210 codepoints in the file, before @var{offset} is applied.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2211
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2212 (Note that, as usual, we assume that octets are in the range 32 to
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2213 127 or 33 to 126. If you have a table in kuten form, with octets in
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2214 the range 1 to 94, you will have to use an offset of 5140,
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2215 i.e. 0x2020.)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2216
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2217 @var{flags}, if specified, control further how the tables are interpreted
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2218 and are used to special-case certain known table weirdnesses in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2219 Unicode tables:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2220
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2221 @table @code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2222 @item ignore-first-column'
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2223 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2224 of 2; the first is the Shift-JIS codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2225
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2226 @item big5
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2227 The charset codepoint is a Big Five codepoint; convert it to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2228 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2229 @end table
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2230 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2231
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2232
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2233 @node Charset Unification, Charsets and Coding Systems, Unicode Support, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2234 @section Character Set Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2235
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2236 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2237 Latin character sets to be disjoint. This results in oddities such as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2238 files containing both ISO 8859/1 and ISO 8859/15 codes, and using ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2239 2022 control sequences to switch between them, as well as more plausible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2240 but often unnecessary combinations like ISO 8859/1 with ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2241 This can be very annoying when sending messages or even in simple
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2242 editing on a single host. Unification works around the problem by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2243 converting as many characters as possible to use a single Latin coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2244 character set before saving the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2245
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2246 This node and its children were ripp'd untimely from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2247 @file{latin-unity.texi}, and have been quickly converted for use here.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2248 However as APIs are likely to diverge, beware of inaccuracies. Please
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2249 report any you discover with @kbd{M-x report-xemacs-bug RET}, as well
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2250 as any ambiguities or downright unintelligible passages.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2251
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2252 A lot of the stuff here doesn't belong here; it belongs in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2253 @ref{Top, , , xemacs, XEmacs User's Manual}. Report those as bugs,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2254 too, preferably with patches.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2255
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2256 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2257 * Overview:: Unification history and general information.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2258 * Usage:: An overview of the operation of Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2259 * Configuration:: Configuring Unification for use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2260 * Theory of Operation:: How Unification works.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2261 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2262 * Charsets and Coding Systems:: Reference lists with annotations.
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2263 * Unification Internals:: Utilities and implementation details.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2264 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2265
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2266 @node Overview, Usage, Charset Unification, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2267 @subsection An Overview of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2268
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2269 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2270 Latin character sets to be disjoint. This manifests itself when a user
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2271 enters characters using input methods associated with different coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2272 character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2273
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2274 A very important example involves email. Many sites, especially in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2275 U.S., default to use of the ISO 8859/1 coded character set (also called
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2276 ``Latin 1,'' though these are somewhat different concepts). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2277 ISO 8859/1 provides a generic CURRENCY SIGN character. Now that the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2278 Euro has become the official currency of most countries in Europe, this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2279 is unsatisfactory (and in practice, useless). So Europeans generally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2280 use ISO 8859/15, which is nearly identical to ISO 8859/1 for most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2281 languages, except that it substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2282
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2283 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2284 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2285 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2286 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2287 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2288
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2289 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2290 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2291 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2292 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2293 only one Latin coded character set (in this case, ISO 8859/2). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2294 this often depends on exactly what text is to be encoded.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2295
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2296 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2297 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2298 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2299
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2300 @node Usage, Configuration, Overview, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2301 @subsection Operation of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2302
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2303 Normally, Unification works in the background by installing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2304 @code{unity-sanity-check} on @code{write-region-pre-hook}. This is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2305 done by default for the ISO 8859 Latin family of character sets. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2306 user activates this functionality for other character set families by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2307 invoking @code{enable-unification}, either interactively or in her
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2308 init file. @xref{Init File, , , xemacs}. Unification can be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2309 deactivated by invoking @code{disable-unification}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2310
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2311 Unification also provides a few functions for remapping or recoding the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2312 buffer by hand. To @dfn{remap} a character means to change the buffer
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2313 representation of the character by using another coded character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2314 Remapping never changes the identity of the character, but may involve
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2315 altering the code point of the character. To @dfn{recode} a character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2316 means to simply change the coded character set. Recoding never alters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2317 the code point of the character, but may change the identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2318 character. @xref{Theory of Operation}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2319
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2320 There are a few variables which determine which coding systems are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2321 always acceptable to Unification: @code{unity-ucs-list},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2322 @code{unity-preferred-coding-system-list}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2323 @code{unity-preapproved-coding-system-list}. The latter two default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2324 to @code{()}, and should probably be avoided because they short-circuit
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2325 the sanity check. If you find you need to use them, consider reporting
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2326 it as a bug or request for enhancement. Because they seem unsafe, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2327 recommended interface is likely to change.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2328
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2329 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2330 * Basic Functionality:: User interface and customization.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2331 * Interactive Usage:: Treating text by hand.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2332 Also documents the hook function(s).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2333 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2334
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2335
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2336 @node Basic Functionality, Interactive Usage, , Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2337 @section Basic Functionality
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2338
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2339 These functions and user options initialize and configure Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2340 In normal use, none of these should be needed.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2341
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2342 @strong{These APIs are certain to change.}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2343
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2344 @defun enable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2345 Set up hooks and initialize variables for latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2346
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2347 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2348
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2349 This function is idempotent. It will reinitialize any hooks or variables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2350 that are not in initial state.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2351 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2352
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2353 @defun disable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2354 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2355
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2356 Clean up hooks and void variables used by latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2357 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2358
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2359 @defopt unity-ucs-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2360 List of coding systems considered to be universal.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2361
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2362 The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2363
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2364 Order matters; coding systems earlier in the list will be preferred when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2365 recommending a coding system. These coding systems will not be used
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2366 without querying the user (unless they are also present in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2367 @code{unity-preapproved-coding-system-list}), and follow the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2368 @code{unity-preferred-coding-system-list} in the list of suggested
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2369 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2370
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2371 If none of the preferred coding systems are feasible, the first in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2372 this list will be the default.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2373
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2374 Notes on certain coding systems: @code{escape-quoted} is a special
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2375 coding system used for autosaves and compiled Lisp in Mule. You should
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2376 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2377 never delete this, although it is rare that a user would want to use it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2378 directly. Unification does not try to be \"smart\" about other general
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2379 ISO 2022 coding systems, such as ISO-2022-JP. (They are not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2380 as equivalent to @code{iso-2022-7}.) If your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2381 one of these, you may consider adding it to @code{unity-ucs-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2382 However, this will typically have the side effect that (eg) ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2383 files will be saved in 7-bit form with ISO 2022 escape sequences.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2384 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2385
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2386 Coding systems which are not Latin and not in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2387 @code{unity-ucs-list} are handled by short circuiting checks of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2388 coding system against the next two variables.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2389
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2390 @defopt unity-preapproved-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2391 List of coding systems used without querying the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2392
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2393 The default value is @samp{(buffer-default preferred)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2394
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2395 The first feasible coding system in this list is used. The special values
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2396 @samp{preferred} and @samp{buffer-default} may be present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2397
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2398 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2399 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2400 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2401
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2402 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2403 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2404 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2405
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2406 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2407 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2408 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2409 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2410
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2411 Note that the first universal coding system in this list shadows all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2412 other coding systems. In particular, if your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2413 a universal coding system, and @code{preferred} is a member of this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2414 list, unification will blithely convert all your files to that coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2415 system. This is considered a feature, but it may surprise most users.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2416 Users who don't like this behavior should put @code{preferred} in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2417 @code{unity-preferred-coding-system-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2418 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2419
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2420 @defopt unity-preferred-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2421 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2422 List of coding systems suggested to the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2423
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2424 The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2425 iso-8859-4 iso-8859-9)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2426
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2427 If none of the coding systems in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2428 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2429 @code{unity-preapproved-coding-system-list} are feasible, this list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2430 will be recommended to the user, followed by the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2431 @code{unity-ucs-list}. The first coding system in this list is default. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2432 special values @samp{preferred} and @samp{buffer-default} may be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2433 present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2434
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2435 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2436 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2437 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2438
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2439 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2440 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2441 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2442
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2443 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2444 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2445 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2446 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2447 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2448
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2449
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2450 @defvar unity-iso-8859-1-aliases
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2451 List of coding systems to be treated as aliases of ISO 8859/1.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2452
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2453 The default value is '(iso-8859-1).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2454
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2455 This is not a user variable; to customize input of coding systems or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2456 charsets, @samp{unity-coding-system-alias-alist} or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2457 @samp{unity-charset-alias-alist}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2458 @end defvar
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2459
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2460
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2461 @node Interactive Usage, , Basic Functionality, Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2462 @section Interactive Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2463
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2464 First, the hook function @code{unity-sanity-check} is documented.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2465 (It is placed here because it is not an interactive function, and there
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2466 is not yet a programmer's section of the manual.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2467
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2468 These functions provide access to internal functionality (such as the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2469 remapping function) and to extra functionality (the recoding functions
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2470 and the test function).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2471
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2472
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2473 @defun unity-sanity-check begin end filename append visit lockname &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2474
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2475 Check if @var{coding-system} can represent all characters between
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2476 @var{begin} and @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2477
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2478 For compatibility with old broken versions of @code{write-region},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2479 @var{coding-system} defaults to @code{buffer-file-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2480 @var{filename}, @var{append}, @var{visit}, and @var{lockname} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2481 ignored.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2482
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2483 Return nil if buffer-file-coding-system is not (ISO-2022-compatible)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2484 Latin. If @code{buffer-file-coding-system} is safe for the charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2485 actually present in the buffer, return it. Otherwise, ask the user to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2486 choose a coding system, and return that.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2487
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2488 This function does @emph{not} do the safe thing when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2489 @code{buffer-file-coding-system} is nil (aka no-conversion). It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2490 considers that ``non-Latin,'' and passes it on to the Mule detection
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2491 mechanism.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2492
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2493 This function is intended for use as a @code{write-region-pre-hook}. It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2494 does nothing except return @var{coding-system} if @code{write-region}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2495 handlers are inhibited.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2496 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2497
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2498 @defun unity-buffer-representations-feasible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2499
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2500 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2501
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2502 Apply unity-region-representations-feasible to the current buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2503 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2504
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2505 @defun unity-region-representations-feasible begin end &optional buf
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2506
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2507 Return character sets that can represent the text from @var{begin} to @var{end} in @var{buf}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2508
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2509 @var{buf} defaults to the current buffer. Called interactively, will be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2510 applied to the region. Function assumes @var{begin} <= @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2511
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2512 The return value is a cons. The car is the list of character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2513 that can individually represent all of the non-ASCII portion of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2514 buffer, and the cdr is the list of character sets that can
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2515 individually represent all of the ASCII portion.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2516
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2517 The following is taken from a comment in the source. Please refer to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2518 the source to be sure of an accurate description.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2519
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2520 The basic algorithm is to map over the region, compute the set of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2521 charsets that can represent each character (the ``feasible charset''),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2522 and take the intersection of those sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2523
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2524 The current implementation takes advantage of the fact that ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2525 characters are common and cannot change asciisets. Then using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2526 skip-chars-forward makes motion over ASCII subregions very fast.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2527
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2528 This same strategy could be applied generally by precomputing classes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2529 of characters equivalent according to their effect on latinsets, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2530 adding a whole class to the skip-chars-forward string once a member is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2531 found.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2532
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2533 Probably efficiency is a function of the number of characters matched,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2534 or maybe the length of the match string? With @code{skip-category-forward}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2535 over a precomputed category table it should be really fast. In practice
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2536 for Latin character sets there are only 29 classes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2537 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2538
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2539 @defun unity-remap-region begin end character-set &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2540
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2541 Remap characters between @var{begin} and @var{end} to equivalents in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2542 @var{character-set}. Optional argument @var{coding-system} may be a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2543 coding system name (a symbol) or nil. Characters with no equivalent are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2544 left as-is.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2545
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2546 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2547 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2548 prompts for @var{character-set}. The function does completion, knows
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2549 how to guess a character set name from a coding system name, and also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2550 provides some common aliases. See @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2551 There is no way to specify @var{coding-system}, as it has no useful
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2552 function interactively.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2553
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2554 Return @var{coding-system} if @var{coding-system} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2555 characters in the region, t if @var{coding-system} is nil and the coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2556 system with G0 = 'ascii and G1 = @var{character-set} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2557 characters, and otherwise nil. Note that a non-null return does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2558 @emph{not} mean it is safe to write the file, only the specified region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2559 (This behavior is useful for multipart MIME encoding and the like.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2560
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2561 Note: by default this function is quite fascist about universal coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2562 systems. It only admits @samp{utf-8}, @samp{iso-2022-7}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2563 @samp{ctext}. Customize @code{unity-approved-ucs-list} to change
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2564 this.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2565
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2566 This function remaps characters that are artificially distinguished by Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2567 internal code. It may change the code point as well as the character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2568 To recode characters that were decoded in the wrong coding system, use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2569 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2570 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2571
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2572 @defun unity-recode-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2573
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2574 Recode characters between @var{begin} and @var{end} from @var{wrong-cs}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2575 to @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2576
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2577 @var{wrong-cs} and @var{right-cs} are character sets. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2578 the same code point but the character set is changed. Only characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2579 from @var{wrong-cs} are changed to @var{right-cs}. The identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2580 character may change. Note that this could be dangerous, if characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2581 whose identities you do not want changed are included in the region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2582 This function cannot guess which characters you want changed, and which
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2583 should be left alone.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2584
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2585 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2586 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2587 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2588 completion, knows how to guess a character set name from a coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2589 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2590 @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2591
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2592 Another way to accomplish this, but using coding systems rather than
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2593 character sets to specify the desired recoding, is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2594 @samp{unity-recode-coding-region}. That function may be faster
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2595 but is somewhat more dangerous, because it may recode more than one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2596 character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2597
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2598 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2599 of any characters, use @samp{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2600 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2601
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2602 @defun unity-recode-coding-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2603
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2604 Recode text between @var{begin} and @var{end} from @var{wrong-cs} to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2605 @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2606
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2607 @var{wrong-cs} and @var{right-cs} are coding systems. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2608 the same code point but the character set is changed. The identity of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2609 characters may change. This is an inherently dangerous function;
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2610 multilingual text may be recoded in unexpected ways. #### It's also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2611 dangerous because the coding systems are not sanity-checked in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2612 current implementation.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2613
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2614 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2615 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2616 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2617 completion, knows how to guess a coding system name from a character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2618 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2619 @code{unity-guess-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2620
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2621 Another, safer, way to accomplish this, using character sets rather
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2622 than coding systems to specify the desired recoding, is to use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2623 @c #### fixme in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2624 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2625
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2626 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2627 of any characters, use @code{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2628 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2629
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2630 Helper functions for input of coding system and character set names.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2631
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2632 @defun unity-guess-charset candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2633 Guess a charset based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2634
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2635 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2636
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2637 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2638 the values in @samp{unity-charset-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2639 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2640
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2641 @defun unity-guess-coding-system candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2642 Guess a coding system based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2643
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2644 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2645
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2646 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2647 the values in @samp{unity-coding-system-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2648 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2649
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2650 @defun unity-example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2651
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2652 A cheesy example for Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2653
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2654 At present it just makes a multilingual buffer. To test, setq
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2655 buffer-file-coding-system to some value, make the buffer dirty (eg
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2656 with RET BackSpace), and save.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2657 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2658
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2659
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2660 @node Configuration, Theory of Operation, Usage, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2661 @subsection Configuring Unification for Use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2662
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2663 If you want Unification to be automatically initialized, invoke
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2664 @samp{enable-unification} with no arguments in your init file.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2665 @xref{Init File, , , xemacs}. If you are using GNU Emacs or an XEmacs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2666 earlier than 21.1, you should also load @file{auto-autoloads} using the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2667 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2668
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2669 You may wish to define aliases for commonly used character sets and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2670 coding systems for convenience in input.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2671
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2672 @defopt unity-charset-alias-alist
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2673 Alist mapping aliases to Mule charset names (symbols)."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2674
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2675 The default value is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2676 @example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2677 ((latin-1 . latin-iso8859-1)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2678 (latin-2 . latin-iso8859-2)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2679 (latin-3 . latin-iso8859-3)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2680 (latin-4 . latin-iso8859-4)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2681 (latin-5 . latin-iso8859-9)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2682 (latin-9 . latin-iso8859-15)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2683 (latin-10 . latin-iso8859-16))
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2684 @end example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2685
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2686 If a charset does not exist on your system, it will not complete and you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2687 will not be able to enter it in response to prompts. A real charset
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2688 with the same name as an alias in this list will shadow the alias.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2689 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2690
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2691 @defopt unity-coding-system-alias-alist nil
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2692 Alist mapping aliases to Mule coding system names (symbols).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2693
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2694 The default value is @samp{nil}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2695 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2696
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2697
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2698 @node Theory of Operation, What Unification Cannot Do for You, Configuration, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2699 @subsection Theory of Operation
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2700
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2701 Standard encodings suffer from the design defect that they do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2702 provide a reliable way to recognize which coded character sets in use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2703 @xref{What Unification Cannot Do for You}. There are scores of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2704 character sets which can be represented by a single octet (8-bit byte),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2705 whose union contains many hundreds of characters. Obviously this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2706 results in great confusion, since you can't tell the players without a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2707 scorecard, and there is no scorecard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2708
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2709 There are two ways to solve this problem. The first is to create a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2710 universal coded character set. This is the concept behind Unicode.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2711 However, there have been satisfactory (nearly) universal character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2712 for several decades, but even today many Westerners resist using Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2713 because they consider its space requirements excessive. On the other
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2714 hand, Asians dislike Unicode because they consider it to be incomplete.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2715 (This is partly, but not entirely, political.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2716
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2717 In any case, Unicode only solves the internal representation problem.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2718 Many data sets will contain files in ``legacy'' encodings, and Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2719 does not help distinguish among them.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2720
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2721 The second approach is to embed information about the encodings used in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2722 a document in its text. This approach is taken by the ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2723 standard. This would solve the problem completely from the users' of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2724 view, except that ISO 2022 is basically not implemented at all, in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2725 sense that few applications or systems implement more than a small
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2726 subset of ISO 2022 functionality. This is due to the fact that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2727 mono-literate users object to the presence of escape sequences in their
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2728 texts (which they, with some justification, consider data corruption).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2729 Programmers are more than willing to cater to these users, since
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2730 implementing ISO 2022 is a painstaking task.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2731
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2732 In fact, Emacs/Mule adopts both of these approaches. Internally it uses
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2733 a universal character set, @dfn{Mule code}. Externally it uses ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2734 techniques both to save files in forms robust to encoding issues, and as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2735 hints when attempting to ``guess'' an unknown encoding. However, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2736 suffers from a design defect, namely it embeds the character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2737 information that ISO 2022 attaches to runs of characters by introducing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2738 them with a control sequence in each character. That causes Mule to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2739 consider the ISO Latin character sets to be disjoint. This manifests
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2740 itself when a user enters characters using input methods associated with
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2741 different coded character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2742
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2743 There are two problems stemming from this design. First, Mule
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2744 represents the same character in different ways. Abstractly, 'ó'
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2745 (LATIN SMALL LETTER O WITH ACUTE) can get represented as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2746 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2747 'óó' in the display might actually be represented [latin-iso8859-1
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2748 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2749 #xF3 ESC - A] in the file. In some cases this treatment would be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2750 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2751 (the CJK ideographic character meaning ``one'')), and although arguably
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2752 incorrect it is convenient when mixing the CJK scripts. But in the case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2753 of the Latin scripts this is wrong.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2754
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2755 Worse yet, it is very likely to occur when mixing ``different'' encodings
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2756 (such as ISO 8859/1 and ISO 8859/15) that differ only in a few code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2757 points that are almost never used. A very important example involves
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2758 email. Many sites, especially in the U.S., default to use of the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2759 8859/1 coded character set (also called ``Latin 1,'' though these are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2760 somewhat different concepts). However, ISO 8859/1 provides a generic
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2761 CURRENCY SIGN character. Now that the Euro has become the official
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2762 currency of most countries in Europe, this is unsatisfactory (and in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2763 practice, useless). So Europeans generally use ISO 8859/15, which is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2764 nearly identical to ISO 8859/1 for most languages, except that it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2765 substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2766
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2767 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2768 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2769 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2770 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2771 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2772
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2773 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2774 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2775 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2776 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2777 only one Latin coded character set (in the case of German and Polish,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2778 ISO 8859/2). However, this often depends on exactly what text is to be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2779 encoded (even for the same pair of languages).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2780
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2781 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2782 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2783 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2784
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2785 Because the problem is rarely noticable in editing a buffer, but tends
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2786 to manifest when that buffer is exported to a file or process, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2787 Unification package uses the strategy of examining the buffer prior to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2788 export. If use of multiple Latin coded character sets is detected,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2789 Unification attempts to unify them by finding a single coded character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2790 set which contains all of the Latin characters in the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2791
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2792 The primary purpose of Unification is to fix the problem by giving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2793 user the choice to change the representation of all characters to one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2794 character set and give sensible recommendations based on context. In
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2795 the 'ó' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2796 both will be suggested. In the EURO SIGN example, only ISO 8859/15
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2797 makes sense, and that is what will be recommended. In both cases, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2798 user will be reminded that there are universal encodings available.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2799
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2800 I call this @dfn{remapping} (from the universal character set to a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2801 particular ISO 8859 coded character set). It is mere accident that this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2802 letter has the same code point in both character sets. (Not entirely,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2803 but there are many examples of Latin characters that have different code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2804 points in different Latin-X sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2805
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2806 Note that, in the 'ó' example, that treating the buffer in this way will
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2807 result in a representation such as [latin-iso8859-2
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2808 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2809 This is guaranteed to occasionally result in the second problem you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2810 observed, to which we now turn.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2811
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2812 This problem is that, although the file is intended to be an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2813 ISO-8859/2-encoded file, in an ISO 8859/1 locale Mule (and every POSIX
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2814 compliant program---this is required by the standard, obvious if you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2815 think a bit, @pxref{What Unification Cannot Do for You}) will read that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2816 file as [latin-iso8859-1 #x73] [latin-iso8859-1 #x73]. Of course this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2817 is no problem if all of the characters in the file are contained in ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2818 8859/1, but suppose there are some which are not, but are contained in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2819 the (intended) ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2820
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2821 You now want to fix this, but not by finding the same character in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2822 another set. Instead, you want to simply change the character set that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2823 Mule associates with that buffer position without changing the code.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2824 (This is conceptually somewhat distinct from the first problem, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2825 logically ought to be handled in the code that defines coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2826 However, unification is not an unreasonable place for it.) Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2827 provides two functions (one fast and dangerous, the other slow and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2828 careful) to handle this. I call this @dfn{recoding}, because the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2829 transformation actually involves @emph{encoding} the buffer to file
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2830 representation, then @emph{decoding} it to buffer representation (in a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2831 different character set). This cannot be done automatically because
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2832 Mule can have no idea what the correct encoding is---after all, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2833 already gave you its best guess. @xref{What Unification Cannot Do for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2834 You}. So these functions must be invoked by the user. @xref{Interactive
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2835 Usage}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2836
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2837
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2838 @node What Unification Cannot Do for You, Unification Internals, Theory of Operation, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2839 @subsection What Unification Cannot Do for You
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2840
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2841 Unification @strong{cannot} save you if you insist on exporting data in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2842 8-bit encodings in a multilingual environment. @emph{You will
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2843 eventually corrupt data if you do this.} It is not Mule's, or any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2844 application's, fault. You will have only yourself to blame; consider
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2845 yourself warned. (It is true that Mule has bugs, which make Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2846 somewhat more dangerous and inconvenient than some naive applications.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2847 We're working to address those, but no application can remedy the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2848 inherent defect of 8-bit encodings.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2849
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2850 Use standard universal encodings, preferably Unicode (UTF-8) unless
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2851 applicable standards indicate otherwise. The most important such case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2852 is Internet messages, where MIME should be used, whether or not the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2853 subordinate encoding is a universal encoding. (Note that since one of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2854 the important provisions of MIME is the @samp{Content-Type} header,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2855 which has the charset parameter, MIME is to be considered a universal
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2856 encoding for the purposes of this manual. Of course, technically
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2857 speaking it's neither a coded character set nor a coding extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2858 technique compliant with ISO 2022.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2859
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2860 As mentioned earlier, the problem is that standard encodings suffer from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2861 the design defect that they do not provide a reliable way to recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2862 which coded character sets are in use. There are scores of character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2863 sets which can be represented by a single octet (8-bit byte), whose
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2864 union contains many hundreds of characters. Thus any 8-bit coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2865 character set must contain characters that share code points used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2866 different characters in other coded character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2867
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2868 This means that a given file's intended encoding cannot be identified
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2869 with 100% reliability unless it contains encoding markers such as those
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2870 provided by MIME or ISO 2022.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2871
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2872 Unification actually makes it more likely that you will have problems of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2873 this kind. Traditionally Mule has been ``helpful'' by simply using an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2874 ISO 2022 universal coding system when the current buffer coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2875 cannot handle all the characters in the buffer. This has the effect
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2876 that, because the file contains control sequences, it is not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2877 as being in the locale's normal 8-bit encoding. It may be annoying if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2878 you are not a Mule expert, but your data is automatically recoverable
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2879 with a tool you already have: Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2880
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2881 However, with unification, Mule converts to a single 8-bit character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2882 when possible. But typically this will @emph{not} be in your usual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2883 locale. Ie, the times that an ISO 8859/1 user will need Unification is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2884 when there are ISO 8859/2 characters in the buffer. But then most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2885 likely the file will be saved in a pure 8-bit encoding that is not ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2886 8859/1, ie, ISO 8859/2. Mule's autorecognizer (which is probably the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2887 most sophisticated yet available) cannot tell the difference between ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2888 8859/1 and ISO 8859/2, and in a Western European locale will choose the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2889 former even though the latter was intended. Even the extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2890 (``statistical recognition'') planned for XEmacs 22 is unlikely to be at
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2891 all accurate in the case of mixed codes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2892
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2893 So now consider adding some additional ISO 8859/1 text to the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2894 If it includes any ISO 8859/1 codes that are used by different
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2895 characters in ISO 8859/2, you now have a file that cannot be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2896 mechanically disentangled. You need a human being who can recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2897 that @emph{this is German and Swedish} and stays in Latin-1, while
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2898 @emph{that is Polish} and needs to be recoded to Latin-2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2899
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2900 Moral: switch to a universal coded character set, preferably Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2901 using the UTF-8 transformation format. If you really need the space,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2902 compress your files.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2903
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2904
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2905 @node Unification Internals, , What Unification Cannot Do for You, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2906 @subsection Internals
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2907
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2908 No internals documentation yet.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2909
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2910 @file{unity-utils.el} provides one utility function.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2911
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2912 @defun unity-dump-tables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2913
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2914 Dump the temporary table created by loading @file{unity-utils.el}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2915 to @file{unity-tables.el}. Loading the latter file initializes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2916 @samp{unity-equivalences}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2917 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2918
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2919
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2920 @node Charsets and Coding Systems, , Charset Unification, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2921 @subsection Charsets and Coding Systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2922
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2923 This section provides reference lists of Mule charsets and coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2924 systems. Mule charsets are typically named by character set and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2925 standard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2926
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2927 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2928 @item ASCII variants
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2929
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2930 Identification of equivalent characters in these sets is not properly
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2931 implemented. Unification does not distinguish the two charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2932
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2933 @samp{ascii} @samp{latin-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2934
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2935 @item Extended Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2936
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2937 Characters from the following ISO 2022 conformant charsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2938 identified with equivalents in other charsets in the group by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2939 Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2940
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2941 @samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2942 @samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2943 @samp{latin-iso8859-13} @samp{latin-iso8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2944
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2945 The follow charsets are Latin variants which are not understood by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2946 Unification. In addition, many of the Asian language standards provide
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2947 ASCII, at least, and sometimes other Latin characters. None of these
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2948 are identified with their ISO 8859 equivalents.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2949
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2950 @samp{vietnamese-viscii-lower}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2951 @samp{vietnamese-viscii-upper}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2952
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2953 @item Other character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2954
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2955 @samp{arabic-1-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2956 @samp{arabic-2-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2957 @samp{arabic-digit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2958 @samp{arabic-iso8859-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2959 @samp{chinese-big5-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2960 @samp{chinese-big5-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2961 @samp{chinese-cns11643-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2962 @samp{chinese-cns11643-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2963 @samp{chinese-cns11643-3}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2964 @samp{chinese-cns11643-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2965 @samp{chinese-cns11643-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2966 @samp{chinese-cns11643-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2967 @samp{chinese-cns11643-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2968 @samp{chinese-gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2969 @samp{chinese-isoir165}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2970 @samp{cyrillic-iso8859-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2971 @samp{ethiopic}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2972 @samp{greek-iso8859-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2973 @samp{hebrew-iso8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2974 @samp{ipa}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2975 @samp{japanese-jisx0208}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2976 @samp{japanese-jisx0208-1978}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2977 @samp{japanese-jisx0212}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2978 @samp{katakana-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2979 @samp{korean-ksc5601}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2980 @samp{sisheng}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2981 @samp{thai-tis620}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2982 @samp{thai-xtis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2983
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2984 @item Non-graphic charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2985
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2986 @samp{control-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2987 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2988
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2989 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2990 @item No conversion
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2991
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2992 Some of these coding systems may specify EOL conventions. Note that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2993 @samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2994 coding system. Although unification attempts to compensate for this, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2995 is possible that the @samp{iso-8859-1} coding system will behave
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2996 differently from other ISO 8859 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2997
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2998 @samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2999
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3000 @item Latin coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3001
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3002 These coding systems are all single-byte, 8-bit ISO 2022 coding systems,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3003 combining ASCII in the GL register (bytes with high-bit clear) and an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3004 extended Latin character set in the GR register (bytes with high-bit set).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3005
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3006 @samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3007 @samp{iso-8859-9} @samp{iso-8859-13} @samp{iso-8859-14} @samp{iso-8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3008
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3009 These coding systems are single-byte, 8-bit coding systems that do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3010 conform to international standards. They should be avoided in all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3011 potentially multilingual contexts, including any text distributed over
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3012 the Internet and World Wide Web.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3013
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3014 @samp{windows-1251}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3015
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3016 @item Multilingual coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3017
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3018 The following ISO-2022-based coding systems are useful for multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3019 text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3020
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3021 @samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3022 @samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3023
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3024 XEmacs also supports Unicode with the Mule-UCS package. These are the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3025 preferred coding systems for multilingual use. (There is a possible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3026 exception for texts that mix several Asian ideographic character sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3027
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3028 @samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3029 @samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3030 @samp{utf-8} @samp{utf-8-ws}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3031
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3032 Development versions of XEmacs (the 21.5 series) support Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3033 internally, with (at least) the following coding systems implemented:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3034
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3035 @samp{utf-16-be} @samp{utf-16-be-bom} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3036 @samp{utf-16-le-bom} @samp{utf-8} @samp{utf-8-bom}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3037
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3038 @item Asian ideographic languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3039
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3040 The following coding systems are based on ISO 2022, and are more or less
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3041 suitable for encoding multilingual texts. They all can represent ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3042 at least, and sometimes several other foreign character sets, without
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3043 resort to arbitrary ISO 2022 designations. However, these subsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3044 not identified with the corresponding national standards in XEmacs Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3045
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3046 @samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3047 @samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3048 @samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3049 @samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3050 @samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3051
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3052 The following coding systems cannot be used for general multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3053 text and do not cooperate well with other coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3054
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3055 @samp{big5} @samp{shift_jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3056
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3057 @item Other languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3058
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3059 The following coding systems are based on ISO 2022. Though none of them
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3060 provides any Latin characters beyond ASCII, XEmacs Mule allows (and up
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3061 to 21.4 defaults to) use of ISO 2022 control sequences to designate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3062 other character sets for inclusion the text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3063
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3064 @samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3065 @samp{ctext-hebrew}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3066
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3067 The following are character sets that do not conform to ISO 2022 and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3068 thus cannot be safely used in a multilingual context.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3069
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3070 @samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3071 @samp{viscii} @samp{vscii}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3072
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3073 @item Special coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3074
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3075 Mule uses the following coding systems for special purposes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3076
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3077 @samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3078
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3079 @samp{escape-quoted} is especially important, as it is used internally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3080 as the coding system for autosaved data.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3081
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3082 The following coding systems are aliases for others, and are used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3083 communication with the host operating system.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3084
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3085 @samp{file-name} @samp{keyboard} @samp{terminal}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3086
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3087 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3088
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3089 Mule detection of coding systems is actually limited to detection of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3090 classes of coding systems called @dfn{coding categories}. These coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3091 categories are identified by the ISO 2022 control sequences they use, if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3092 any, by their conformance to ISO 2022 restrictions on code points that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3093 may be used, and by characteristic patterns of use of 8-bit code points.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3094
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3095 @samp{no-conversion}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3096 @samp{utf-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3097 @samp{ucs-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3098 @samp{iso-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3099 @samp{iso-lock-shift}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3100 @samp{iso-8-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3101 @samp{iso-8-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3102 @samp{iso-8-designate}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3103 @samp{shift-jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3104 @samp{big5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3105
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3106
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3107 @c end of mule.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3108