diff man/mule/mule.texi @ 70:131b0175ea99 r20-0b30

Import from CVS: tag r20-0b30
author cvs
date Mon, 13 Aug 2007 09:02:59 +0200
parents
children 360340f9fd5f
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/man/mule/mule.texi	Mon Aug 13 09:02:59 2007 +0200
@@ -0,0 +1,544 @@
+
+@node Coding-system
+@section Coding-system
+
+@noindent
+`coding-system' is a method for encoding several
+character-sets and represented by a symbol which has
+properties of 'coding-system and 'eol-type.
+
+You can specify different coding-system on file I/O, process
+I/O, output to terminal (if not running on X), input from
+keyboard (if not running on X).
+
+
+@menu
+* Structure::   Structure of coding-system
+	  o Property 'coding-system
+	  o Property 'eol-type
+	  o Property 'post-read-conversion
+	  o Property 'pre-write-conversion
+* Creation::   How to create coding-system?
+* Predefined coding-system::
+* Automatic conversion::
+	  o Category of coding-system
+	  o How automatic conversion works?
+	  o Priority of category
+* Mode-line::   How coding-system is shown in mode-line?::
+* ISO2022 restriction::
+* Big5::        Special treatment of Big5
+@end menu
+
+@node Structure
+@subsection Structure of coding-system
+
+@subsubsection Property 'coding-system
+
+The value of the property 'coding-system is a vector:
+@quotation
+  [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ]
+@end quotation
+or the other coding-system.  Contents of the vector are:
+@example
+  TYPE:	nil: no conversion, t: automatic conversion,
+	0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL.
+  MNEMONIC: a character shown at mode-line to indicate the coding-system.
+  DOCUMENT: a describing documents for the coding-system.
+  DUMMY: always nil (for backward compatibility)
+  FLAGS (option): more precise information about the coding-system,
+    If TYPE is 2 (ISO2022), FLAGS should be a list of:
+      LB-G0, LB-G1, LB-G2, LB-G3:
+	Leading character of charset initially designated to G? graphic set,
+	nil means G? is not designated initially,
+	lb-invalid means G? can never be designated to,
+	if (- leading-char) is specified, it is designated on output,
+      SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\",
+      ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output,
+      ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output
+      SEVEN: non-nil - use 7-bit environment on output,
+      LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift
+	or designation by escape sequence,
+      USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII,
+      USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983,
+      NO-ISO6429: non-nil - don't use ISO6429's direction specification,
+  If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU,
+  If TYPE is 4 (private), FLAGS should be a cons of CCL programs
+    for encoding and decoding.  See documentation of CCL for more detail.
+@end example
+
+@subsubsection Property 'eol-type
+
+The value of the property 'eol-type is:
+  nil: no conversion for end-of-line type
+  1:   LF
+  2:   CRLF
+  3:   CR
+  vector of length 3: automatic detection of end-of-line type.
+	1st element: coding-system of eol-type LF
+	2nd element: coding-system of eol-type CRLF
+	3rd element: coding-system of eol-type CR
+
+@subsubsection Property 'post-read-conversion
+
+The value of the property 'post-read-conversion is a
+function to convert some text just read into a buffer.  When
+the function is called, the text has already been converted
+according to 'coding-system and 'eol-type of the
+coding-system.  The argument of the function is the region
+(START and END) of inserted text.
+
+@subsection Property 'pre-write-conversion
+
+The value of the property 'pre-write-conversion is a
+function to convert some text just before writing it out.
+After the function is called, the text is converted accoding
+to 'coding-system and 'eol-type of the coding-system.  The
+argument of the function is the region (START and END) of
+the text.
+
+@node Creation
+@subsection How to create coding-system?
+
+Mule provides a function `make-coding-system' to create a
+coding-system.
+
+FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS
+
+Register symbol NAME as a coding-system whose 'coding-system
+property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and
+'eol-type property is EOL-TYPE.  If `t' is specified as
+EOL-TYPE, the value of 'eol-type property is a vector of
+generated coding-systems whose 'eol-type properties are 1
+(LF), 2 (CRLF), and 3 (CR).  The names of generated
+coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively.
+
+Just to make an alias of some coding-system, call a fucntion
+`copy-coding-system'.
+
+FUNCTION copy-coding-system: ORIGINAL ALIAS
+
+Make the same coding-system as ORIGINAL and name it ALIAS.
+If 'eol-type property of ORIGINAL is a vector, coding-systems
+ALIASunix, ALIASdos, and ALIASmac are generated, and
+'eol-type property of ALIAS becomes a vector of them.
+
+@node Predefined coding-system
+@subsection Predefined coding-system
+
+See lisp/mule.el.
+
+@node Automatic conversion
+@subsection Automatic conversion
+
+@subsubsection Category of coding-system
+
+Mule has a facility to detect coding-system of text
+automatically, however, what mule actually detect is not a
+coding-system itself but a category of coding-system.  A
+category is also represented by a symbol and a value should
+be an actual coding-system.
+
+There are eight categories:
+@table @asis
+@item *coding-category-internal*:
+	coding-system used in a buffer
+@item *coding-category-sjis*
+	Shift-JIS
+@item *coding-category-iso-7*
+	ISO2022 variation with the following feature:
+	  o no locking shift, single shift
+	  o only G0 is used
+@item *coding-category-iso-8-1*
+	ISO2022 variation with the following feature:
+	  o no locking shift
+	  o designation sequence is allowed only for G0 and G1
+	  o G1 is used only for 1-byte character set
+@item *coding-category-iso-8-2*
+	ISO2022 variation with the following feature:
+	  o no locking shift
+	  o designation sequence is allowed only for G0 and G1
+	  o G1 is used only for 2-byte character set
+@item *coding-category-iso-else*
+	ISO2022 variation which doesn't satisfy any of above.
+@item *coding-category-big5*
+	Big5 (ETen or HKU)
+@item *coding-category-bin*
+	Any other coding-system which uses MSB.
+@end table
+
+The values of these symbols are pre-defined as follows:
+
+@example
+----- lisp/mule.el -----------------------------------------
+(defvar *coding-category-internal* '*internal*)
+(defvar *coding-category-sjis* '*sjis*)
+(defvar *coding-category-iso-7* '*junet*)
+(defvar *coding-category-iso-8-1* '*ctext*)
+(defvar *coding-category-iso-8-2* '*euc-japan*)
+(defvar *coding-category-iso-else* '*iso-2022-ss2-7*)
+(defvar *coding-category-big5* '*big5-eten*)
+(defvar *coding-category-bin* '*noconv*)
+------------------------------------------------------------
+@end example
+
+but, some of them are overridden in such language specific
+files as japanese.el, chinese.el, etc.
+
+@subsubsection How automatic conversion works?
+
+When coding-system `*autoconv*' is specified on reading text
+(this is the default), mule tries to detect a category of
+coding-system by which text are encoded.  If an appropriate
+category is found, it converts text according to a
+coding-system bound to the cateogry.  If the 'eol-type
+property of the coding-system is a vector of coding-systems
+and Mule detects a type of end-of-line (LF, CRLF, or CR) of
+the text, one of those coding-system is used.
+
+Automatic conversion occurs both on reading from files and
+inputing from process.  In the latter case, if some
+coding-system is found, output-coding-system of the process
+is also set to the found coding-system.
+
+@subsubsection Priority of cateogry
+
+In the case that more than two categories are found, the
+category of the highest priority is selected.
+
+A priority of category is pre-defined as follows:
+
+@example
+----- lisp/mule.el -----------------------------------------
+(set-coding-priority
+ '(*coding-category-iso-8-2*
+   *coding-category-sjis*
+   *coding-category-iso-8-1*
+   *coding-category-big5*
+   *coding-category-iso-7*
+   *coding-category-iso-else*
+   *coding-category-bin*
+   *coding-category-internal*))
+------------------------------------------------------------
+@end example
+
+The function `set-coding-priority' put a property 'priority
+to each element of the argument from 0 to 7 (smaller number
+has higher priority).  Some language specific files may
+override this priority.
+
+@node Mode-line
+@subsection How coding-system is shown in mode-line?
+
+Each coding-system has unique mnemonic (one character).
+By default, mnemonic of `file-coding-system' of a buffer is
+shown at the left of mode-line of the buffer.  In addition,
+the mnemonic is followed by an another mnemonic to show
+eol-type of the coding-system.  This mnemonic is defined as
+follows:
+	".": LF
+	":": CRLF
+	"'": CR
+	"_": not yet desided
+	"-": nil (for coding-system of nil, *noconv*, or *internal*)
+So, usual appearance of mode-line for a buffer which is
+visiting a file (*junet* encoding on Unix system) is:
+
+@example
+	    +-- mnemonic of file-coding-system
+	    |+-- mnemonic of eol-type
+	    VV
+	[--]J.:----Mule: filename
+@end example
+
+The left most bracket is the indicator for inputing method.
+
+When a buffer is attaced to some process, coding-system
+for input and output of the process are also shown as
+follows:
+
+@example
+	    +-- mnemonic of file-coding-system
+	    |+-- mnemonic of eol-type of file-coding-system
+	    ||+-- mnemonic of input-coding-system of a process
+	    |||+-- mnemonic of eol-type of input-coding-system
+	    ||||+-- mnemonic of output-coding-system of a process
+	    |||||+-- mnemonic of eol-type of output-coding-system
+	    VVVVVV
+	[--]+_+.--:--**-Mule: *shell*
+@end example
+
+This means that Mule is now communicating with shell with
+coding-systems *autoconv*unix ("+.") for input and nil
+("--") for output.
+
+@node ISO2022 restriction
+@subsection ISO2022 restriction
+
+For decoding to Type 2 (ISO2022), we have the following
+restrictions:
+
+@table @asis
+@item Locking-Shift:
+Use SI and SO only when decoding with a coding-system
+whose LOCK-SHIFT and SEVEN is t.
+
+@item Single-Shift:
+Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if
+SEVEN is t).
+
+@item Invocation:
+G0 is always invoked to GL, G1 to GR (but only if SEVEN is
+nil).  G2 and G3 are invoked to GL by Single-Shift of SS2
+and SS3.
+
+@item Unofficial use of ESC sequence for designation:
+If SEVEN is t, LOCK-SHIFT is nil, and designation to G2
+and G3 are prohibited, we should designate all character
+sets to G0 (and hence invoke to GL).  To designate 96
+char-set to G0, we use "ESC , <F>".  For instance, to
+designate ISO8859-1 to G0, we use "ESC , A".
+
+@item Unofficial use of ESC sequence for composit character:
+To indicate the start and end of composit character, we
+use ESC 0 (start) and ESC 1 (end).
+
+@item Text direction specifier of ISO6429
+We use ISO6429's ESC sequence "ESC [ 2 ]" to change text
+direction to right-to-left, and "ESC [ 0 ]" to revert it
+to left-to-right.
+@end table
+
+@node Big5
+@subsection Special treatment of Big5
+
+As far as I know, there's several different codes called
+Big5.  The most famous ones are Big5-ETen and
+Big5-HKU-form2.  Since both of them use a code range 0xa140
+- 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is
+skipped) and number of characters is more than 13000, it's
+impossible to treat each of them as a single character-set
+in the current Mule system.  So, Mule treat them in a quite
+irregular manner as described below:
+
+@enumerate
+@item
+Mule does not treats them as a different character set,
+but as the same character set called Big5.
+	Caution!! Big5 is a different character set from GB.
+
+@item
+Mule divides Big5 into two sub-character-sets:
+	0xa140 - 0xc67e (Level 1)
+	0xc6a1 - 0xfefe (Level 2)
+and allocates two leading-chars lc-big5-1 and lc-big5-2 to
+them.  (See character.txt)
+
+@item
+Usually, each leading-char (or character-set) has unique
+character category.  But lc-big5-1 and lc-big5-2 has the
+same character category of mnemonic 't'.  So, regular
+expression "\\ct" matches any Big5 (Level 1 and Level 2)
+characters.  (See syntax.txt)
+
+@item
+If you specify ISO2022 type coding-system on output,
+Mule converts Big5 code using unofficial final-characters
+'0' (for Level 1) and '1' (for Level 2).
+
+@item
+You can use either fonts of ETen or HKU for displaying
+Big5 code.  Mule judges which font is used by examining
+existence of character whose code point is 0xC6A1.  If it
+exists, the font is HKU, else the fonts is ETen.
+@end enumerate
+
+@node Syntax
+@section Syntax and Category of character
+
+@subsection Syntax
+
+Mule can define syntax of all multi-byte characters by
+@code{modify-syntax-entry}.
+
+The first argument of @code{modify-syntax-entry should} be one of below:
+@enumerate
+@item
+ASCII character
+@item
+multi-byte character
+@item
+leading character of multi-byte character
+@item
+partially defined characters returned by:
+
+@quotation
+@code{(make-character leading-char arg)}
+@end quotation
+@end enumerate
+
+There's a restriction of specifying matching character within 
+second argument.  If the first argument specifies multi-byte
+character or leading char of multi-byte character, the
+matching character should have the same leading character.  If
+the character is 2-byte code, the first-byte of it should
+also be the same with the first-byte of first argument.
+
+@subsection Category
+
+Like syntax, category also defines characteristics of
+characters.  The differences are:
+@enumerate
+@item
+Each Character can have more than one category.
+@item
+User can define new type of category as he wishes.
+	Example: See japanese.el
+@item
+@code{char-category} returns all mnemonics of the character by string.
+@item
+For regular expression search, you can use the \cm or \Cm (any mnemonics
+comes at the place of 'm') instead of \sm and \Sm.
+@end enumerate
+
+@node Font
+@section Font
+
+FONTSET is a set of fonts which have the same height and style.  A
+fontset should hopefully contain enough fonts to display a character of
+various character sets.
+
+Mule uses fontset instead of font.  You can specify fontset at any place
+where you can specify font.  You can still specify font, in which case,
+a fontset which include the font is searched and used.
+
+Like font, fontset is also a string specifying the name.
+
+@menu
+* Initial fontsets::	Fontsets which Mule have at startup time.
+* Specify fontset::     How to specify a fontset?
+* Manage fontset::      How to create or modify a fontset?
+@end menu
+
+@node Initial fontsets
+@subsection Initial fontsets
+
+@subsubsection "default-fontset"
+
+Mule automatically creates a fontset named "default-fontset" at startup
+time.  Each font in this fontset is specifed by a very generic name such
+as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and
+"-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji).
+These values are defined in @file{lisp/term/x-win.el}.
+
+If there's no other fontsets specifed by X's resource, "default-fontset"
+is used for the first frame of Mule.
+
+In most cases, this is enough.  You probably don't have to have any
+other fontsets.
+
+@subsubsection  X's resourse
+
+Mule also creates fontsets specified in X's resource "fontSetList (class
+FontSetList)".  The value is a comma separated list of fontset names.
+
+@example
+*FontSetList: 16,24
+@end example
+
+The actual contents of each fontset is specified by "fontSet-xxx (class
+FontSet-xxx)" where "xxx" is a name of the corresponding fontset.  The
+value of this resource is a comma separated list of font names.
+
+@example
+*FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1
+@end example
+
+Each font name should not contain wild card `*' or `?' in
+CHARSET_REGSTRY field because a character set for this font is
+recognized by this field.  This means that you don't have to care about
+the order of font names.
+
+For instance,
+
+@example
+*FontSet-16:\
+        -etl-fixed-medium-r-*--16-*-iso8859-1\
+	-ming-fixed-medium-r-*--*-*-jisx0208.1983-*
+@end example
+
+is enough to tell Mule that the fontset "16" contains ASCII font and
+JISX0208 font.  Please note that the second name has only wild card in
+PIXEL_SIZE field.  Since Mule try to open a font of the same PIXEL_SIZE
+as ASCII font of the same fontset, you'ld better not specify actual
+value in PIXEL_SIZE field except for ASCII font.
+
+As for fonts not listed in the specification of fontset, corresponding
+font names in "default fontset" is used.
+
+The first fontset in FontSetList is used for the first frame of Mule.
+If you want to use "default-fontset" while specifying other fontsets in
+the resource, please put "default-fontset" at the first of the value.
+
+@example
+*FontSetList: default-fontset,16,24
+@end example
+
+In this case, you don't have to have the resource
+"FontSet-default-fontset".
+
+@node Specify fontset
+@subsection How to specify a fontset?
+
+You can specify fontset at any place where you can sepcify font.
+
+To change the fontset used for the first frame of Mule:
+
+@enumerate
+@item
+command line arguments "-fn xxx" or "-font xxx"
+
+If this argument exits, fontset is searched in the following order:
+@enumerate
+@item
+A fontset whose name is "xxx".
+@item
+A fontset which contains ASCII font "xxx".
+@item
+Create a new fontset "xxx" which contains ASCII font "xxx".
+@end enumerate
+
+@item
+In your ~/.emacs,
+
+@example
+(setcdr (assoc 'font default-frame-alist) "xxx")
+@end example
+
+@end enumerate
+
+To change a fontset after Mule started:
+
+@enumerate
+@item
+By the command
+
+@example
+M-x set-default-fontset<CR>xxx<CR>
+@end example
+
+@item
+By @key{Ctl-Mouse-3}
+
+@end enumerate
+
+@node Manage fontset
+@subsection How to create or modify a fontset?
+
+You can create a new fontset by `new-fontset' and modify an
+existing fontset by `set-fontset-font'.
+
+You can get a list of fontset currently created by
+`fonset-list'.
+
+You can check if a fontset is already created or not by
+`fonsetp'.