Mercurial > hg > xemacs-beta
comparison man/lispref/strings.texi @ 412:697ef44129c6 r21-2-14
Import from CVS: tag r21-2-14
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:20:41 +0200 |
parents | 74fd4e045ea6 |
children |
comparison
equal
deleted
inserted
replaced
411:12e008d41344 | 412:697ef44129c6 |
---|---|
16 buffers, and for many other purposes. Because strings are so important, | 16 buffers, and for many other purposes. Because strings are so important, |
17 XEmacs Lisp has many functions expressly for manipulating them. XEmacs | 17 XEmacs Lisp has many functions expressly for manipulating them. XEmacs |
18 Lisp programs use strings more often than individual characters. | 18 Lisp programs use strings more often than individual characters. |
19 | 19 |
20 @menu | 20 @menu |
21 * String Basics:: Basic properties of strings and characters. | 21 * Basics: String Basics. Basic properties of strings and characters. |
22 * Predicates for Strings:: Testing whether an object is a string or char. | 22 * Predicates for Strings:: Testing whether an object is a string or char. |
23 * Creating Strings:: Functions to allocate new strings. | 23 * Creating Strings:: Functions to allocate new strings. |
24 * Predicates for Characters:: Testing whether an object is a character. | 24 * Predicates for Characters:: Testing whether an object is a character. |
25 * Character Codes:: Each character has an equivalent integer. | 25 * Character Codes:: Each character has an equivalent integer. |
26 * Text Comparison:: Comparing characters or strings. | 26 * Text Comparison:: Comparing characters or strings. |
43 determined only by how it is used. @xref{Character Type}. | 43 determined only by how it is used. @xref{Character Type}. |
44 | 44 |
45 The length of a string (like any array) is fixed and independent of | 45 The length of a string (like any array) is fixed and independent of |
46 the string contents, and cannot be altered. Strings in Lisp are | 46 the string contents, and cannot be altered. Strings in Lisp are |
47 @emph{not} terminated by a distinguished character code. (By contrast, | 47 @emph{not} terminated by a distinguished character code. (By contrast, |
48 strings in C are terminated by a character with @sc{ascii} code 0.) | 48 strings in C are terminated by a character with @sc{ASCII} code 0.) |
49 This means that any character, including the null character (@sc{ascii} | 49 This means that any character, including the null character (@sc{ASCII} |
50 code 0), is a valid element of a string.@refill | 50 code 0), is a valid element of a string.@refill |
51 | 51 |
52 Since strings are considered arrays, you can operate on them with the | 52 Since strings are considered arrays, you can operate on them with the |
53 general array functions. (@xref{Sequences Arrays Vectors}.) For | 53 general array functions. (@xref{Sequences Arrays Vectors}.) For |
54 example, you can access or change individual characters in a string | 54 example, you can access or change individual characters in a string |
319 | 319 |
320 @table @asis | 320 @table @asis |
321 @item 0 - 31 | 321 @item 0 - 31 |
322 Control set 0 | 322 Control set 0 |
323 @item 32 - 127 | 323 @item 32 - 127 |
324 @sc{ascii} | 324 @sc{ASCII} |
325 @item 128 - 159 | 325 @item 128 - 159 |
326 Control set 1 | 326 Control set 1 |
327 @item 160 - 255 | 327 @item 160 - 255 |
328 Right half of ISO-8859-1 | 328 Right half of ISO-8859-1 |
329 @end table | 329 @end table |
330 | 330 |
331 If support for @sc{mule} does not exist, these are the only valid | 331 If support for @sc{MULE} does not exist, these are the only valid |
332 character values. When @sc{mule} support exists, the values assigned to | 332 character values. When @sc{MULE} support exists, the values assigned to |
333 other characters may vary depending on the particular version of XEmacs, | 333 other characters may vary depending on the particular version of XEmacs, |
334 the order in which character sets were loaded, etc., and you should not | 334 the order in which character sets were loaded, etc., and you should not |
335 depend on them. | 335 depend on them. |
336 @end defun | 336 @end defun |
337 | 337 |
425 the character from @var{string1}, then @var{string1} is less, and this | 425 the character from @var{string1}, then @var{string1} is less, and this |
426 function returns @code{t}. If the lesser character is the one from | 426 function returns @code{t}. If the lesser character is the one from |
427 @var{string2}, then @var{string1} is greater, and this function returns | 427 @var{string2}, then @var{string1} is greater, and this function returns |
428 @code{nil}. If the two strings match entirely, the value is @code{nil}. | 428 @code{nil}. If the two strings match entirely, the value is @code{nil}. |
429 | 429 |
430 Pairs of characters are compared by their @sc{ascii} codes. Keep in | 430 Pairs of characters are compared by their @sc{ASCII} codes. Keep in |
431 mind that lower case letters have higher numeric values in the | 431 mind that lower case letters have higher numeric values in the |
432 @sc{ascii} character set than their upper case counterparts; numbers and | 432 @sc{ASCII} character set than their upper case counterparts; numbers and |
433 many punctuation characters have a lower numeric value than upper case | 433 many punctuation characters have a lower numeric value than upper case |
434 letters. | 434 letters. |
435 | 435 |
436 @example | 436 @example |
437 @group | 437 @group |
513 @defun string-to-char string | 513 @defun string-to-char string |
514 @cindex string to character | 514 @cindex string to character |
515 This function returns the first character in @var{string}. If the | 515 This function returns the first character in @var{string}. If the |
516 string is empty, the function returns 0. (Under XEmacs 19, the value is | 516 string is empty, the function returns 0. (Under XEmacs 19, the value is |
517 also 0 when the first character of @var{string} is the null character, | 517 also 0 when the first character of @var{string} is the null character, |
518 @sc{ascii} code 0.) | 518 @sc{ASCII} code 0.) |
519 | 519 |
520 @example | 520 @example |
521 (string-to-char "ABC") | 521 (string-to-char "ABC") |
522 @result{} ?A ;; @r{Under XEmacs 20.} | 522 @result{} ?A ;; @r{Under XEmacs 20.} |
523 @result{} 65 ;; @r{Under XEmacs 19.} | 523 @result{} 65 ;; @r{Under XEmacs 19.} |
608 @node String Properties | 608 @node String Properties |
609 @section String Properties | 609 @section String Properties |
610 @cindex string properties | 610 @cindex string properties |
611 @cindex properties of strings | 611 @cindex properties of strings |
612 | 612 |
613 Just as with symbols, extents, faces, and glyphs, you can attach | 613 Similar to symbols, extents, faces, and glyphs, you can attach |
614 additional information to strings in the form of @dfn{string | 614 additional information to strings in the form of @dfn{string |
615 properties}. These differ from text properties, which are logically | 615 properties}. These differ from text properties, which are logically |
616 attached to particular characters in the string. | 616 attached to particular characters in the string. |
617 | 617 |
618 To attach a property to a string, use @code{put}. To retrieve a property | 618 To attach a property to a string, use @code{put}. To retrieve a property |
619 from a string, use @code{get}. You can also use @code{remprop} to remove | 619 from a string, use @code{get}. You can also use @code{remprop} to remove |
620 a property from a string and @code{object-plist} to retrieve a list of | 620 a property from a string and @code{object-props} to retrieve a list of |
621 all the properties in a string. | 621 all the properties in a string. |
622 | 622 |
623 @node Formatting Strings | 623 @node Formatting Strings |
624 @section Formatting Strings | 624 @section Formatting Strings |
625 @cindex formatting strings | 625 @cindex formatting strings |
908 characters (the letters @samp{A} through @samp{Z} and @samp{a} through | 908 characters (the letters @samp{A} through @samp{Z} and @samp{a} through |
909 @samp{z}); other characters are not altered. The functions do not | 909 @samp{z}); other characters are not altered. The functions do not |
910 modify the strings that are passed to them as arguments. | 910 modify the strings that are passed to them as arguments. |
911 | 911 |
912 The examples below use the characters @samp{X} and @samp{x} which have | 912 The examples below use the characters @samp{X} and @samp{x} which have |
913 @sc{ascii} codes 88 and 120 respectively. | 913 @sc{ASCII} codes 88 and 120 respectively. |
914 | 914 |
915 @defun downcase string-or-char | 915 @defun downcase string-or-char |
916 This function converts a character or a string to lower case. | 916 This function converts a character or a string to lower case. |
917 | 917 |
918 When the argument to @code{downcase} is a string, the function creates | 918 When the argument to @code{downcase} is a string, the function creates |
993 You can customize case conversion by installing a special @dfn{case | 993 You can customize case conversion by installing a special @dfn{case |
994 table}. A case table specifies the mapping between upper case and lower | 994 table}. A case table specifies the mapping between upper case and lower |
995 case letters. It affects both the string and character case conversion | 995 case letters. It affects both the string and character case conversion |
996 functions (see the previous section) and those that apply to text in the | 996 functions (see the previous section) and those that apply to text in the |
997 buffer (@pxref{Case Changes}). You need a case table if you are using a | 997 buffer (@pxref{Case Changes}). You need a case table if you are using a |
998 language which has letters other than the standard @sc{ascii} letters. | 998 language which has letters other than the standard @sc{ASCII} letters. |
999 | 999 |
1000 A case table is a list of this form: | 1000 A case table is a list of this form: |
1001 | 1001 |
1002 @example | 1002 @example |
1003 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences}) | 1003 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences}) |
1020 equivalent; any two characters that are related by case-conversion have | 1020 equivalent; any two characters that are related by case-conversion have |
1021 the same canonical equivalent character. | 1021 the same canonical equivalent character. |
1022 | 1022 |
1023 The element @var{equivalences} is a map that cyclicly permutes each | 1023 The element @var{equivalences} is a map that cyclicly permutes each |
1024 equivalence class (of characters with the same canonical equivalent). | 1024 equivalence class (of characters with the same canonical equivalent). |
1025 (For ordinary @sc{ascii}, this would map @samp{a} into @samp{A} and | 1025 (For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and |
1026 @samp{A} into @samp{a}, and likewise for each set of equivalent | 1026 @samp{A} into @samp{a}, and likewise for each set of equivalent |
1027 characters.) | 1027 characters.) |
1028 | 1028 |
1029 When you construct a case table, you can provide @code{nil} for | 1029 When you construct a case table, you can provide @code{nil} for |
1030 @var{canonicalize}; then Emacs fills in this string from @var{upcase} | 1030 @var{canonicalize}; then Emacs fills in this string from @var{upcase} |
1061 @defun set-case-table table | 1061 @defun set-case-table table |
1062 This sets the current buffer's case table to @var{table}. | 1062 This sets the current buffer's case table to @var{table}. |
1063 @end defun | 1063 @end defun |
1064 | 1064 |
1065 The following three functions are convenient subroutines for packages | 1065 The following three functions are convenient subroutines for packages |
1066 that define non-@sc{ascii} character sets. They modify a string | 1066 that define non-@sc{ASCII} character sets. They modify a string |
1067 @var{downcase-table} provided as an argument; this should be a string to | 1067 @var{downcase-table} provided as an argument; this should be a string to |
1068 be used as the @var{downcase} part of a case table. They also modify | 1068 be used as the @var{downcase} part of a case table. They also modify |
1069 the standard syntax table. @xref{Syntax Tables}. | 1069 the standard syntax table. @xref{Syntax Tables}. |
1070 | 1070 |
1071 @defun set-case-syntax-pair uc lc downcase-table | 1071 @defun set-case-syntax-pair uc lc downcase-table |
1107 | 1107 |
1108 Note that char tables as a primitive type, and all of the functions in | 1108 Note that char tables as a primitive type, and all of the functions in |
1109 this section, exist only in XEmacs 20. In XEmacs 19, char tables are | 1109 this section, exist only in XEmacs 20. In XEmacs 19, char tables are |
1110 generally implemented using a vector of 256 elements. | 1110 generally implemented using a vector of 256 elements. |
1111 | 1111 |
1112 When @sc{mule} support exists, the types of ranges that can be assigned | 1112 When @sc{MULE} support exists, the types of ranges that can be assigned |
1113 values are | 1113 values are |
1114 | 1114 |
1115 @itemize @bullet | 1115 @itemize @bullet |
1116 @item | 1116 @item |
1117 all characters | 1117 all characters |
1121 a single row in a two-octet charset | 1121 a single row in a two-octet charset |
1122 @item | 1122 @item |
1123 a single character | 1123 a single character |
1124 @end itemize | 1124 @end itemize |
1125 | 1125 |
1126 When @sc{mule} support is not present, the types of ranges that can be | 1126 When @sc{MULE} support is not present, the types of ranges that can be |
1127 assigned values are | 1127 assigned values are |
1128 | 1128 |
1129 @itemize @bullet | 1129 @itemize @bullet |
1130 @item | 1130 @item |
1131 all characters | 1131 all characters |
1152 @item category | 1152 @item category |
1153 Used for category tables, which specify the regexp categories | 1153 Used for category tables, which specify the regexp categories |
1154 that a character is in. The valid values are @code{nil} or a | 1154 that a character is in. The valid values are @code{nil} or a |
1155 bit vector of 95 elements. Higher-level Lisp functions are | 1155 bit vector of 95 elements. Higher-level Lisp functions are |
1156 provided for working with category tables. Currently categories | 1156 provided for working with category tables. Currently categories |
1157 and category tables only exist when @sc{mule} support is present. | 1157 and category tables only exist when @sc{MULE} support is present. |
1158 @item char | 1158 @item char |
1159 A generalized char table, for mapping from one character to | 1159 A generalized char table, for mapping from one character to |
1160 another. Used for case tables, syntax matching tables, | 1160 another. Used for case tables, syntax matching tables, |
1161 @code{keyboard-translate-table}, etc. The valid values are characters. | 1161 @code{keyboard-translate-table}, etc. The valid values are characters. |
1162 @item generic | 1162 @item generic |
1201 | 1201 |
1202 @itemize @bullet | 1202 @itemize @bullet |
1203 @item | 1203 @item |
1204 @code{t} (all characters are affected) | 1204 @code{t} (all characters are affected) |
1205 @item | 1205 @item |
1206 A charset (only allowed when @sc{mule} support is present) | 1206 A charset (only allowed when @sc{MULE} support is present) |
1207 @item | 1207 @item |
1208 A vector of two elements: a two-octet charset and a row number | 1208 A vector of two elements: a two-octet charset and a row number |
1209 (only allowed when @sc{mule} support is present) | 1209 (only allowed when @sc{MULE} support is present) |
1210 @item | 1210 @item |
1211 A single character | 1211 A single character |
1212 @end itemize | 1212 @end itemize |
1213 | 1213 |
1214 @var{val} must be a value appropriate for the type of @var{table}. | 1214 @var{val} must be a value appropriate for the type of @var{table}. |