comparison man/lispref/strings.texi @ 412:697ef44129c6 r21-2-14

Import from CVS: tag r21-2-14
author cvs
date Mon, 13 Aug 2007 11:20:41 +0200
parents 74fd4e045ea6
children
comparison
equal deleted inserted replaced
411:12e008d41344 412:697ef44129c6
16 buffers, and for many other purposes. Because strings are so important, 16 buffers, and for many other purposes. Because strings are so important,
17 XEmacs Lisp has many functions expressly for manipulating them. XEmacs 17 XEmacs Lisp has many functions expressly for manipulating them. XEmacs
18 Lisp programs use strings more often than individual characters. 18 Lisp programs use strings more often than individual characters.
19 19
20 @menu 20 @menu
21 * String Basics:: Basic properties of strings and characters. 21 * Basics: String Basics. Basic properties of strings and characters.
22 * Predicates for Strings:: Testing whether an object is a string or char. 22 * Predicates for Strings:: Testing whether an object is a string or char.
23 * Creating Strings:: Functions to allocate new strings. 23 * Creating Strings:: Functions to allocate new strings.
24 * Predicates for Characters:: Testing whether an object is a character. 24 * Predicates for Characters:: Testing whether an object is a character.
25 * Character Codes:: Each character has an equivalent integer. 25 * Character Codes:: Each character has an equivalent integer.
26 * Text Comparison:: Comparing characters or strings. 26 * Text Comparison:: Comparing characters or strings.
43 determined only by how it is used. @xref{Character Type}. 43 determined only by how it is used. @xref{Character Type}.
44 44
45 The length of a string (like any array) is fixed and independent of 45 The length of a string (like any array) is fixed and independent of
46 the string contents, and cannot be altered. Strings in Lisp are 46 the string contents, and cannot be altered. Strings in Lisp are
47 @emph{not} terminated by a distinguished character code. (By contrast, 47 @emph{not} terminated by a distinguished character code. (By contrast,
48 strings in C are terminated by a character with @sc{ascii} code 0.) 48 strings in C are terminated by a character with @sc{ASCII} code 0.)
49 This means that any character, including the null character (@sc{ascii} 49 This means that any character, including the null character (@sc{ASCII}
50 code 0), is a valid element of a string.@refill 50 code 0), is a valid element of a string.@refill
51 51
52 Since strings are considered arrays, you can operate on them with the 52 Since strings are considered arrays, you can operate on them with the
53 general array functions. (@xref{Sequences Arrays Vectors}.) For 53 general array functions. (@xref{Sequences Arrays Vectors}.) For
54 example, you can access or change individual characters in a string 54 example, you can access or change individual characters in a string
319 319
320 @table @asis 320 @table @asis
321 @item 0 - 31 321 @item 0 - 31
322 Control set 0 322 Control set 0
323 @item 32 - 127 323 @item 32 - 127
324 @sc{ascii} 324 @sc{ASCII}
325 @item 128 - 159 325 @item 128 - 159
326 Control set 1 326 Control set 1
327 @item 160 - 255 327 @item 160 - 255
328 Right half of ISO-8859-1 328 Right half of ISO-8859-1
329 @end table 329 @end table
330 330
331 If support for @sc{mule} does not exist, these are the only valid 331 If support for @sc{MULE} does not exist, these are the only valid
332 character values. When @sc{mule} support exists, the values assigned to 332 character values. When @sc{MULE} support exists, the values assigned to
333 other characters may vary depending on the particular version of XEmacs, 333 other characters may vary depending on the particular version of XEmacs,
334 the order in which character sets were loaded, etc., and you should not 334 the order in which character sets were loaded, etc., and you should not
335 depend on them. 335 depend on them.
336 @end defun 336 @end defun
337 337
425 the character from @var{string1}, then @var{string1} is less, and this 425 the character from @var{string1}, then @var{string1} is less, and this
426 function returns @code{t}. If the lesser character is the one from 426 function returns @code{t}. If the lesser character is the one from
427 @var{string2}, then @var{string1} is greater, and this function returns 427 @var{string2}, then @var{string1} is greater, and this function returns
428 @code{nil}. If the two strings match entirely, the value is @code{nil}. 428 @code{nil}. If the two strings match entirely, the value is @code{nil}.
429 429
430 Pairs of characters are compared by their @sc{ascii} codes. Keep in 430 Pairs of characters are compared by their @sc{ASCII} codes. Keep in
431 mind that lower case letters have higher numeric values in the 431 mind that lower case letters have higher numeric values in the
432 @sc{ascii} character set than their upper case counterparts; numbers and 432 @sc{ASCII} character set than their upper case counterparts; numbers and
433 many punctuation characters have a lower numeric value than upper case 433 many punctuation characters have a lower numeric value than upper case
434 letters. 434 letters.
435 435
436 @example 436 @example
437 @group 437 @group
513 @defun string-to-char string 513 @defun string-to-char string
514 @cindex string to character 514 @cindex string to character
515 This function returns the first character in @var{string}. If the 515 This function returns the first character in @var{string}. If the
516 string is empty, the function returns 0. (Under XEmacs 19, the value is 516 string is empty, the function returns 0. (Under XEmacs 19, the value is
517 also 0 when the first character of @var{string} is the null character, 517 also 0 when the first character of @var{string} is the null character,
518 @sc{ascii} code 0.) 518 @sc{ASCII} code 0.)
519 519
520 @example 520 @example
521 (string-to-char "ABC") 521 (string-to-char "ABC")
522 @result{} ?A ;; @r{Under XEmacs 20.} 522 @result{} ?A ;; @r{Under XEmacs 20.}
523 @result{} 65 ;; @r{Under XEmacs 19.} 523 @result{} 65 ;; @r{Under XEmacs 19.}
608 @node String Properties 608 @node String Properties
609 @section String Properties 609 @section String Properties
610 @cindex string properties 610 @cindex string properties
611 @cindex properties of strings 611 @cindex properties of strings
612 612
613 Just as with symbols, extents, faces, and glyphs, you can attach 613 Similar to symbols, extents, faces, and glyphs, you can attach
614 additional information to strings in the form of @dfn{string 614 additional information to strings in the form of @dfn{string
615 properties}. These differ from text properties, which are logically 615 properties}. These differ from text properties, which are logically
616 attached to particular characters in the string. 616 attached to particular characters in the string.
617 617
618 To attach a property to a string, use @code{put}. To retrieve a property 618 To attach a property to a string, use @code{put}. To retrieve a property
619 from a string, use @code{get}. You can also use @code{remprop} to remove 619 from a string, use @code{get}. You can also use @code{remprop} to remove
620 a property from a string and @code{object-plist} to retrieve a list of 620 a property from a string and @code{object-props} to retrieve a list of
621 all the properties in a string. 621 all the properties in a string.
622 622
623 @node Formatting Strings 623 @node Formatting Strings
624 @section Formatting Strings 624 @section Formatting Strings
625 @cindex formatting strings 625 @cindex formatting strings
908 characters (the letters @samp{A} through @samp{Z} and @samp{a} through 908 characters (the letters @samp{A} through @samp{Z} and @samp{a} through
909 @samp{z}); other characters are not altered. The functions do not 909 @samp{z}); other characters are not altered. The functions do not
910 modify the strings that are passed to them as arguments. 910 modify the strings that are passed to them as arguments.
911 911
912 The examples below use the characters @samp{X} and @samp{x} which have 912 The examples below use the characters @samp{X} and @samp{x} which have
913 @sc{ascii} codes 88 and 120 respectively. 913 @sc{ASCII} codes 88 and 120 respectively.
914 914
915 @defun downcase string-or-char 915 @defun downcase string-or-char
916 This function converts a character or a string to lower case. 916 This function converts a character or a string to lower case.
917 917
918 When the argument to @code{downcase} is a string, the function creates 918 When the argument to @code{downcase} is a string, the function creates
993 You can customize case conversion by installing a special @dfn{case 993 You can customize case conversion by installing a special @dfn{case
994 table}. A case table specifies the mapping between upper case and lower 994 table}. A case table specifies the mapping between upper case and lower
995 case letters. It affects both the string and character case conversion 995 case letters. It affects both the string and character case conversion
996 functions (see the previous section) and those that apply to text in the 996 functions (see the previous section) and those that apply to text in the
997 buffer (@pxref{Case Changes}). You need a case table if you are using a 997 buffer (@pxref{Case Changes}). You need a case table if you are using a
998 language which has letters other than the standard @sc{ascii} letters. 998 language which has letters other than the standard @sc{ASCII} letters.
999 999
1000 A case table is a list of this form: 1000 A case table is a list of this form:
1001 1001
1002 @example 1002 @example
1003 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences}) 1003 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences})
1020 equivalent; any two characters that are related by case-conversion have 1020 equivalent; any two characters that are related by case-conversion have
1021 the same canonical equivalent character. 1021 the same canonical equivalent character.
1022 1022
1023 The element @var{equivalences} is a map that cyclicly permutes each 1023 The element @var{equivalences} is a map that cyclicly permutes each
1024 equivalence class (of characters with the same canonical equivalent). 1024 equivalence class (of characters with the same canonical equivalent).
1025 (For ordinary @sc{ascii}, this would map @samp{a} into @samp{A} and 1025 (For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and
1026 @samp{A} into @samp{a}, and likewise for each set of equivalent 1026 @samp{A} into @samp{a}, and likewise for each set of equivalent
1027 characters.) 1027 characters.)
1028 1028
1029 When you construct a case table, you can provide @code{nil} for 1029 When you construct a case table, you can provide @code{nil} for
1030 @var{canonicalize}; then Emacs fills in this string from @var{upcase} 1030 @var{canonicalize}; then Emacs fills in this string from @var{upcase}
1061 @defun set-case-table table 1061 @defun set-case-table table
1062 This sets the current buffer's case table to @var{table}. 1062 This sets the current buffer's case table to @var{table}.
1063 @end defun 1063 @end defun
1064 1064
1065 The following three functions are convenient subroutines for packages 1065 The following three functions are convenient subroutines for packages
1066 that define non-@sc{ascii} character sets. They modify a string 1066 that define non-@sc{ASCII} character sets. They modify a string
1067 @var{downcase-table} provided as an argument; this should be a string to 1067 @var{downcase-table} provided as an argument; this should be a string to
1068 be used as the @var{downcase} part of a case table. They also modify 1068 be used as the @var{downcase} part of a case table. They also modify
1069 the standard syntax table. @xref{Syntax Tables}. 1069 the standard syntax table. @xref{Syntax Tables}.
1070 1070
1071 @defun set-case-syntax-pair uc lc downcase-table 1071 @defun set-case-syntax-pair uc lc downcase-table
1107 1107
1108 Note that char tables as a primitive type, and all of the functions in 1108 Note that char tables as a primitive type, and all of the functions in
1109 this section, exist only in XEmacs 20. In XEmacs 19, char tables are 1109 this section, exist only in XEmacs 20. In XEmacs 19, char tables are
1110 generally implemented using a vector of 256 elements. 1110 generally implemented using a vector of 256 elements.
1111 1111
1112 When @sc{mule} support exists, the types of ranges that can be assigned 1112 When @sc{MULE} support exists, the types of ranges that can be assigned
1113 values are 1113 values are
1114 1114
1115 @itemize @bullet 1115 @itemize @bullet
1116 @item 1116 @item
1117 all characters 1117 all characters
1121 a single row in a two-octet charset 1121 a single row in a two-octet charset
1122 @item 1122 @item
1123 a single character 1123 a single character
1124 @end itemize 1124 @end itemize
1125 1125
1126 When @sc{mule} support is not present, the types of ranges that can be 1126 When @sc{MULE} support is not present, the types of ranges that can be
1127 assigned values are 1127 assigned values are
1128 1128
1129 @itemize @bullet 1129 @itemize @bullet
1130 @item 1130 @item
1131 all characters 1131 all characters
1152 @item category 1152 @item category
1153 Used for category tables, which specify the regexp categories 1153 Used for category tables, which specify the regexp categories
1154 that a character is in. The valid values are @code{nil} or a 1154 that a character is in. The valid values are @code{nil} or a
1155 bit vector of 95 elements. Higher-level Lisp functions are 1155 bit vector of 95 elements. Higher-level Lisp functions are
1156 provided for working with category tables. Currently categories 1156 provided for working with category tables. Currently categories
1157 and category tables only exist when @sc{mule} support is present. 1157 and category tables only exist when @sc{MULE} support is present.
1158 @item char 1158 @item char
1159 A generalized char table, for mapping from one character to 1159 A generalized char table, for mapping from one character to
1160 another. Used for case tables, syntax matching tables, 1160 another. Used for case tables, syntax matching tables,
1161 @code{keyboard-translate-table}, etc. The valid values are characters. 1161 @code{keyboard-translate-table}, etc. The valid values are characters.
1162 @item generic 1162 @item generic
1201 1201
1202 @itemize @bullet 1202 @itemize @bullet
1203 @item 1203 @item
1204 @code{t} (all characters are affected) 1204 @code{t} (all characters are affected)
1205 @item 1205 @item
1206 A charset (only allowed when @sc{mule} support is present) 1206 A charset (only allowed when @sc{MULE} support is present)
1207 @item 1207 @item
1208 A vector of two elements: a two-octet charset and a row number 1208 A vector of two elements: a two-octet charset and a row number
1209 (only allowed when @sc{mule} support is present) 1209 (only allowed when @sc{MULE} support is present)
1210 @item 1210 @item
1211 A single character 1211 A single character
1212 @end itemize 1212 @end itemize
1213 1213
1214 @var{val} must be a value appropriate for the type of @var{table}. 1214 @var{val} must be a value appropriate for the type of @var{table}.