comparison man/lispref/strings.texi @ 0:376386a54a3c r19-14

Import from CVS: tag r19-14
author cvs
date Mon, 13 Aug 2007 08:45:50 +0200
parents
children 05472e90ae02
comparison
equal deleted inserted replaced
-1:000000000000 0:376386a54a3c
1 @c -*-texinfo-*-
2 @c This is part of the XEmacs Lisp Reference Manual.
3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
4 @c See the file lispref.texi for copying conditions.
5 @setfilename ../../info/strings.info
6 @node Strings and Characters, Lists, Numbers, Top
7 @chapter Strings and Characters
8 @cindex strings
9 @cindex character arrays
10 @cindex characters
11 @cindex bytes
12
13 A string in XEmacs Lisp is an array that contains an ordered sequence
14 of characters. Strings are used as names of symbols, buffers, and
15 files, to send messages to users, to hold text being copied between
16 buffers, and for many other purposes. Because strings are so important,
17 XEmacs Lisp has many functions expressly for manipulating them. XEmacs
18 Lisp programs use strings more often than individual characters.
19
20 @menu
21 * Basics: String Basics. Basic properties of strings and characters.
22 * Predicates for Strings:: Testing whether an object is a string or char.
23 * Creating Strings:: Functions to allocate new strings.
24 * Predicates for Characters:: Testing whether an object is a character.
25 * Character Codes:: Each character has an equivalent integer.
26 * Text Comparison:: Comparing characters or strings.
27 * String Conversion:: Converting characters or strings and vice versa.
28 * Modifying Strings:: Changing characters in a string.
29 * String Properties:: Additional information attached to strings.
30 * Formatting Strings:: @code{format}: XEmacs's analog of @code{printf}.
31 * Character Case:: Case conversion functions.
32 * Case Tables:: Customizing case conversion.
33 * Char Tables:: Mapping from characters to Lisp objects.
34 @end menu
35
36 @node String Basics
37 @section String and Character Basics
38
39 Strings in XEmacs Lisp are arrays that contain an ordered sequence of
40 characters. Characters are their own primitive object type in XEmacs
41 20. However, in XEmacs 19, characters are represented in XEmacs Lisp as
42 integers; whether an integer was intended as a character or not is
43 determined only by how it is used. @xref{Character Type}.
44
45 The length of a string (like any array) is fixed and independent of
46 the string contents, and cannot be altered. Strings in Lisp are
47 @emph{not} terminated by a distinguished character code. (By contrast,
48 strings in C are terminated by a character with @sc{ASCII} code 0.)
49 This means that any character, including the null character (@sc{ASCII}
50 code 0), is a valid element of a string.@refill
51
52 Since strings are considered arrays, you can operate on them with the
53 general array functions. (@xref{Sequences Arrays Vectors}.) For
54 example, you can access or change individual characters in a string
55 using the functions @code{aref} and @code{aset} (@pxref{Array
56 Functions}).
57
58 Strings use an efficient representation for storing the characters
59 in them, and thus take up much less memory than a vector of the same
60 length.
61
62 Sometimes you will see strings used to hold key sequences. This
63 exists for backward compatibility with Emacs 18, but should @emph{not}
64 be used in new code, since many key chords can't be represented at
65 all and others (in particular meta key chords) are confused with
66 accented characters.
67
68 @ignore @c Not accurate any more
69 Each character in a string is stored in a single byte. Therefore,
70 numbers not in the range 0 to 255 are truncated when stored into a
71 string. This means that a string takes up much less memory than a
72 vector of the same length.
73
74 Sometimes key sequences are represented as strings. When a string is
75 a key sequence, string elements in the range 128 to 255 represent meta
76 characters (which are extremely large integers) rather than keyboard
77 events in the range 128 to 255.
78
79 Strings cannot hold characters that have the hyper, super or alt
80 modifiers; they can hold @sc{ASCII} control characters, but no other
81 control characters. They do not distinguish case in @sc{ASCII} control
82 characters. @xref{Character Type}, for more information about
83 representation of meta and other modifiers for keyboard input
84 characters.
85 @end ignore
86
87 Strings are useful for holding regular expressions. You can also
88 match regular expressions against strings (@pxref{Regexp Search}). The
89 functions @code{match-string} (@pxref{Simple Match Data}) and
90 @code{replace-match} (@pxref{Replacing Match}) are useful for
91 decomposing and modifying strings based on regular expression matching.
92
93 Like a buffer, a string can contain extents in it. These extents are
94 created when a function such as @code{buffer-substring} is called on a
95 region with duplicable extents in it. When the string is inserted into
96 a buffer, the extents are inserted along with it. @xref{Duplicable
97 Extents}.
98
99 @xref{Text}, for information about functions that display strings or
100 copy them into buffers. @xref{Character Type}, and @ref{String Type},
101 for information about the syntax of characters and strings.
102
103 @node Predicates for Strings
104 @section The Predicates for Strings
105
106 For more information about general sequence and array predicates,
107 see @ref{Sequences Arrays Vectors}, and @ref{Arrays}.
108
109 @defun stringp object
110 This function returns @code{t} if @var{object} is a string, @code{nil}
111 otherwise.
112 @end defun
113
114 @defun char-or-string-p object
115 This function returns @code{t} if @var{object} is a string or a
116 character, @code{nil} otherwise.
117 @end defun
118
119 @node Creating Strings
120 @section Creating Strings
121
122 The following functions create strings, either from scratch, or by
123 putting strings together, or by taking them apart.
124
125 @defun make-string count character
126 This function returns a string made up of @var{count} repetitions of
127 @var{character}. If @var{count} is negative, an error is signaled.
128
129 @example
130 (make-string 5 ?x)
131 @result{} "xxxxx"
132 (make-string 0 ?x)
133 @result{} ""
134 @end example
135
136 Other functions to compare with this one include @code{char-to-string}
137 (@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
138 @code{make-list} (@pxref{Building Lists}).
139 @end defun
140
141 @defun substring string start &optional end
142 This function returns a new string which consists of those characters
143 from @var{string} in the range from (and including) the character at the
144 index @var{start} up to (but excluding) the character at the index
145 @var{end}. The first character is at index zero.
146
147 @example
148 @group
149 (substring "abcdefg" 0 3)
150 @result{} "abc"
151 @end group
152 @end example
153
154 @noindent
155 Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
156 index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied
157 from the string @code{"abcdefg"}. The index 3 marks the character
158 position up to which the substring is copied. The character whose index
159 is 3 is actually the fourth character in the string.
160
161 A negative number counts from the end of the string, so that @minus{}1
162 signifies the index of the last character of the string. For example:
163
164 @example
165 @group
166 (substring "abcdefg" -3 -1)
167 @result{} "ef"
168 @end group
169 @end example
170
171 @noindent
172 In this example, the index for @samp{e} is @minus{}3, the index for
173 @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1.
174 Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded.
175
176 When @code{nil} is used as an index, it stands for the length of the
177 string. Thus,
178
179 @example
180 @group
181 (substring "abcdefg" -3 nil)
182 @result{} "efg"
183 @end group
184 @end example
185
186 Omitting the argument @var{end} is equivalent to specifying @code{nil}.
187 It follows that @code{(substring @var{string} 0)} returns a copy of all
188 of @var{string}.
189
190 @example
191 @group
192 (substring "abcdefg" 0)
193 @result{} "abcdefg"
194 @end group
195 @end example
196
197 @noindent
198 But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence
199 Functions}).
200
201 If the characters copied from @var{string} have duplicable extents or
202 text properties, those are copied into the new string also.
203 @xref{Duplicable Extents}.
204
205 A @code{wrong-type-argument} error is signaled if either @var{start} or
206 @var{end} is not an integer or @code{nil}. An @code{args-out-of-range}
207 error is signaled if @var{start} indicates a character following
208 @var{end}, or if either integer is out of range for @var{string}.
209
210 Contrast this function with @code{buffer-substring} (@pxref{Buffer
211 Contents}), which returns a string containing a portion of the text in
212 the current buffer. The beginning of a string is at index 0, but the
213 beginning of a buffer is at index 1.
214 @end defun
215
216 @defun concat &rest sequences
217 @cindex copying strings
218 @cindex concatenating strings
219 This function returns a new string consisting of the characters in the
220 arguments passed to it (along with their text properties, if any). The
221 arguments may be strings, lists of numbers, or vectors of numbers; they
222 are not themselves changed. If @code{concat} receives no arguments, it
223 returns an empty string.
224
225 @example
226 (concat "abc" "-def")
227 @result{} "abc-def"
228 (concat "abc" (list 120 (+ 256 121)) [122])
229 @result{} "abcxyz"
230 ;; @r{@code{nil} is an empty sequence.}
231 (concat "abc" nil "-def")
232 @result{} "abc-def"
233 (concat "The " "quick brown " "fox.")
234 @result{} "The quick brown fox."
235 (concat)
236 @result{} ""
237 @end example
238
239 @noindent
240 The second example above shows how characters stored in strings are
241 taken modulo 256. In other words, each character in the string is
242 stored in one byte.
243
244 The @code{concat} function always constructs a new string that is
245 not @code{eq} to any existing string.
246
247 When an argument is an integer (not a sequence of integers), it is
248 converted to a string of digits making up the decimal printed
249 representation of the integer. @strong{Don't use this feature; we plan
250 to eliminate it. If you already use this feature, change your programs
251 now!} The proper way to convert an integer to a decimal number in this
252 way is with @code{format} (@pxref{Formatting Strings}) or
253 @code{number-to-string} (@pxref{String Conversion}).
254
255 @example
256 @group
257 (concat 137)
258 @result{} "137"
259 (concat 54 321)
260 @result{} "54321"
261 @end group
262 @end example
263
264 For information about other concatenation functions, see the description
265 of @code{mapconcat} in @ref{Mapping Functions}, @code{vconcat} in
266 @ref{Vectors}, @code{bvconcat} in @ref{Bit Vectors}, and @code{append}
267 in @ref{Building Lists}.
268 @end defun
269
270 @node Predicates for Characters
271 @section The Predicates for Characters
272
273 @defun characterp object
274 This function returns @code{t} if @var{object} is a character.
275
276 Some functions that work on integers (e.g. the comparison functions
277 <, <=, =, /=, etc. and the arithmetic functions +, -, *, etc.)
278 accept characters and implicitly convert them into integers. In
279 general, functions that work on characters also accept char-ints and
280 implicitly convert them into characters. WARNING: Neither of these
281 behaviors is very desirable, and they are maintained for backward
282 compatibility with old E-Lisp programs that confounded characters and
283 integers willy-nilly. These behaviors may change in the future; therefore,
284 do not rely on them. Instead, convert the characters explicitly
285 using @code{char-int}.
286 @end defun
287
288 @defun integer-or-char-p object
289 This function returns @code{t} if @var{object} is an integer or character.
290 @end defun
291
292 @node Character Codes
293 @section Character Codes
294
295 @defun char-int ch
296 This function converts a character into an equivalent integer.
297 The resulting integer will always be non-negative. The integers in
298 the range 0 - 255 map to characters as follows:
299
300 @table @asis
301 @item 0 - 31
302 Control set 0
303 @item 32 - 127
304 @sc{ASCII}
305 @item 128 - 159
306 Control set 1
307 @item 160 - 255
308 Right half of ISO-8859-1
309 @end table
310
311 If support for @sc{MULE} does not exist, these are the only valid
312 character values. When @sc{MULE} support exists, the values assigned to
313 other characters may vary depending on the particular version of XEmacs,
314 the order in which character sets were loaded, etc., and you should not
315 depend on them.
316 @end defun
317
318 @defun int-char integer
319 This function converts an integer into the equivalent character. Not
320 all integers correspond to valid characters; use @code{char-int-p} to
321 determine whether this is the case. If the integer cannot be converted,
322 @code{nil} is returned.
323 @end defun
324
325 @defun char-int-p object
326 This function returns @code{t} if @var{object} is an integer that can be
327 converted into a character.
328 @end defun
329
330 @defun char-or-char-int-p object
331 This function returns @code{t} if @var{object} is a character or an
332 integer that can be converted into one.
333 @end defun
334
335 @need 2000
336 @node Text Comparison
337 @section Comparison of Characters and Strings
338 @cindex string equality
339
340 @defun char-equal character1 character2
341 This function returns @code{t} if the arguments represent the same
342 character, @code{nil} otherwise. This function ignores differences
343 in case if @code{case-fold-search} is non-@code{nil}.
344
345 @example
346 (char-equal ?x ?x)
347 @result{} t
348 (char-to-string (+ 256 ?x))
349 @result{} "x"
350 (char-equal ?x (+ 256 ?x))
351 @result{} t
352 @end example
353 @end defun
354
355 @defun string= string1 string2
356 This function returns @code{t} if the characters of the two strings
357 match exactly; case is significant.
358
359 @example
360 (string= "abc" "abc")
361 @result{} t
362 (string= "abc" "ABC")
363 @result{} nil
364 (string= "ab" "ABC")
365 @result{} nil
366 @end example
367
368 @ignore @c `equal' in XEmacs does not compare text properties
369 The function @code{string=} ignores the text properties of the
370 two strings. To compare strings in a way that compares their text
371 properties also, use @code{equal} (@pxref{Equality Predicates}).
372 @end ignore
373 @end defun
374
375 @defun string-equal string1 string2
376 @code{string-equal} is another name for @code{string=}.
377 @end defun
378
379 @cindex lexical comparison
380 @defun string< string1 string2
381 @c (findex string< causes problems for permuted index!!)
382 This function compares two strings a character at a time. First it
383 scans both the strings at once to find the first pair of corresponding
384 characters that do not match. If the lesser character of those two is
385 the character from @var{string1}, then @var{string1} is less, and this
386 function returns @code{t}. If the lesser character is the one from
387 @var{string2}, then @var{string1} is greater, and this function returns
388 @code{nil}. If the two strings match entirely, the value is @code{nil}.
389
390 Pairs of characters are compared by their @sc{ASCII} codes. Keep in
391 mind that lower case letters have higher numeric values in the
392 @sc{ASCII} character set than their upper case counterparts; numbers and
393 many punctuation characters have a lower numeric value than upper case
394 letters.
395
396 @example
397 @group
398 (string< "abc" "abd")
399 @result{} t
400 (string< "abd" "abc")
401 @result{} nil
402 (string< "123" "abc")
403 @result{} t
404 @end group
405 @end example
406
407 When the strings have different lengths, and they match up to the
408 length of @var{string1}, then the result is @code{t}. If they match up
409 to the length of @var{string2}, the result is @code{nil}. A string of
410 no characters is less than any other string.
411
412 @example
413 @group
414 (string< "" "abc")
415 @result{} t
416 (string< "ab" "abc")
417 @result{} t
418 (string< "abc" "")
419 @result{} nil
420 (string< "abc" "ab")
421 @result{} nil
422 (string< "" "")
423 @result{} nil
424 @end group
425 @end example
426 @end defun
427
428 @defun string-lessp string1 string2
429 @code{string-lessp} is another name for @code{string<}.
430 @end defun
431
432 See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
433 a way to compare text in buffers. The function @code{string-match},
434 which matches a regular expression against a string, can be used
435 for a kind of string comparison; see @ref{Regexp Search}.
436
437 @node String Conversion
438 @section Conversion of Characters and Strings
439 @cindex conversion of strings
440
441 This section describes functions for conversions between characters,
442 strings and integers. @code{format} and @code{prin1-to-string}
443 (@pxref{Output Functions}) can also convert Lisp objects into strings.
444 @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
445 string representation of a Lisp object into an object.
446
447 @xref{Documentation}, for functions that produce textual descriptions
448 of text characters and general input events
449 (@code{single-key-description} and @code{text-char-description}). These
450 functions are used primarily for making help messages.
451
452 @defun char-to-string character
453 @cindex character to string
454 This function returns a new string with a length of one character.
455 The value of @var{character}, modulo 256, is used to initialize the
456 element of the string.
457
458 This function is similar to @code{make-string} with an integer argument
459 of 1. (@xref{Creating Strings}.) This conversion can also be done with
460 @code{format} using the @samp{%c} format specification.
461 (@xref{Formatting Strings}.)
462
463 @example
464 (char-to-string ?x)
465 @result{} "x"
466 (char-to-string (+ 256 ?x))
467 @result{} "x"
468 (make-string 1 ?x)
469 @result{} "x"
470 @end example
471 @end defun
472
473 @defun string-to-char string
474 @cindex string to character
475 This function returns the first character in @var{string}. If the
476 string is empty, the function returns 0. (Under XEmacs 19, the value is
477 also 0 when the first character of @var{string} is the null character,
478 @sc{ASCII} code 0.)
479
480 @example
481 (string-to-char "ABC")
482 @result{} ?A ;; @r{Under XEmacs 20.}
483 @result{} 65 ;; @r{Under XEmacs 19.}
484 (string-to-char "xyz")
485 @result{} ?x ;; @r{Under XEmacs 20.}
486 @result{} 120 ;; @r{Under XEmacs 19.}
487 (string-to-char "")
488 @result{} 0
489 (string-to-char "\000")
490 @result{} ?\^@ ;; @r{Under XEmacs 20.}
491 @result{} 0 ;; @r{Under XEmacs 20.}
492 @end example
493
494 This function may be eliminated in the future if it does not seem useful
495 enough to retain.
496 @end defun
497
498 @defun number-to-string number
499 @cindex integer to string
500 @cindex integer to decimal
501 This function returns a string consisting of the printed
502 representation of @var{number}, which may be an integer or a floating
503 point number. The value starts with a sign if the argument is
504 negative.
505
506 @example
507 (number-to-string 256)
508 @result{} "256"
509 (number-to-string -23)
510 @result{} "-23"
511 (number-to-string -23.5)
512 @result{} "-23.5"
513 @end example
514
515 @cindex int-to-string
516 @code{int-to-string} is a semi-obsolete alias for this function.
517
518 See also the function @code{format} in @ref{Formatting Strings}.
519 @end defun
520
521 @defun string-to-number string
522 @cindex string to number
523 This function returns the numeric value of the characters in
524 @var{string}, read in base ten. It skips spaces and tabs at the
525 beginning of @var{string}, then reads as much of @var{string} as it can
526 interpret as a number. (On some systems it ignores other whitespace at
527 the beginning, not just spaces and tabs.) If the first character after
528 the ignored whitespace is not a digit or a minus sign, this function
529 returns 0.
530
531 @example
532 (string-to-number "256")
533 @result{} 256
534 (string-to-number "25 is a perfect square.")
535 @result{} 25
536 (string-to-number "X256")
537 @result{} 0
538 (string-to-number "-4.5")
539 @result{} -4.5
540 @end example
541
542 @findex string-to-int
543 @code{string-to-int} is an obsolete alias for this function.
544 @end defun
545
546 @node Modifying Strings
547 @section Modifying Strings
548 @cindex strings, modifying
549
550 You can modify a string using the general array-modifying primitives.
551 @xref{Arrays}. The function @code{aset} modifies a single character;
552 the function @code{fillarray} sets all characters in the string to
553 a specified character.
554
555 Each string has a tick counter that starts out at zero (when the string
556 is created) and is incremented each time a change is made to that
557 string.
558
559 @defun string-modified-tick string
560 This function returns the tick counter for @samp{string}.
561 @end defun
562
563 @node String Properties
564 @section String Properties
565 @cindex string properties
566 @cindex properties of strings
567
568 Similar to symbols, extents, faces, and glyphs, you can attach
569 additional information to strings in the form of @dfn{string
570 properties}. These differ from text properties, which are logically
571 attached to particular characters in the string.
572
573 To attach a property to a string, use @code{put}. To retrieve a property
574 from a string, use @code{get}. You can also use @code{remprop} to remove
575 a property from a string and @code{object-props} to retrieve a list of
576 all the properties in a string.
577
578 @node Formatting Strings
579 @section Formatting Strings
580 @cindex formatting strings
581 @cindex strings, formatting them
582
583 @dfn{Formatting} means constructing a string by substitution of
584 computed values at various places in a constant string. This string
585 controls how the other values are printed as well as where they appear;
586 it is called a @dfn{format string}.
587
588 Formatting is often useful for computing messages to be displayed. In
589 fact, the functions @code{message} and @code{error} provide the same
590 formatting feature described here; they differ from @code{format} only
591 in how they use the result of formatting.
592
593 @defun format string &rest objects
594 This function returns a new string that is made by copying
595 @var{string} and then replacing any format specification
596 in the copy with encodings of the corresponding @var{objects}. The
597 arguments @var{objects} are the computed values to be formatted.
598 @end defun
599
600 @cindex @samp{%} in format
601 @cindex format specification
602 A format specification is a sequence of characters beginning with a
603 @samp{%}. Thus, if there is a @samp{%d} in @var{string}, the
604 @code{format} function replaces it with the printed representation of
605 one of the values to be formatted (one of the arguments @var{objects}).
606 For example:
607
608 @example
609 @group
610 (format "The value of fill-column is %d." fill-column)
611 @result{} "The value of fill-column is 72."
612 @end group
613 @end example
614
615 If @var{string} contains more than one format specification, the
616 format specifications correspond with successive values from
617 @var{objects}. Thus, the first format specification in @var{string}
618 uses the first such value, the second format specification uses the
619 second such value, and so on. Any extra format specifications (those
620 for which there are no corresponding values) cause unpredictable
621 behavior. Any extra values to be formatted are ignored.
622
623 Certain format specifications require values of particular types.
624 However, no error is signaled if the value actually supplied fails to
625 have the expected type. Instead, the output is likely to be
626 meaningless.
627
628 Here is a table of valid format specifications:
629
630 @table @samp
631 @item %s
632 Replace the specification with the printed representation of the object,
633 made without quoting. Thus, strings are represented by their contents
634 alone, with no @samp{"} characters, and symbols appear without @samp{\}
635 characters. This is equivalent to printing the object with @code{princ}.
636
637 If there is no corresponding object, the empty string is used.
638
639 @item %S
640 Replace the specification with the printed representation of the object,
641 made with quoting. Thus, strings are enclosed in @samp{"} characters,
642 and @samp{\} characters appear where necessary before special characters.
643 This is equivalent to printing the object with @code{prin1}.
644
645 If there is no corresponding object, the empty string is used.
646
647 @item %o
648 @cindex integer to octal
649 Replace the specification with the base-eight representation of an
650 integer.
651
652 @item %d
653 @itemx %i
654 Replace the specification with the base-ten representation of an
655 integer.
656
657 @item %x
658 @cindex integer to hexadecimal
659 Replace the specification with the base-sixteen representation of an
660 integer, using lowercase letters.
661
662 @item %X
663 @cindex integer to hexadecimal
664 Replace the specification with the base-sixteen representation of an
665 integer, using uppercase letters.
666
667 @item %c
668 Replace the specification with the character which is the value given.
669
670 @item %e
671 Replace the specification with the exponential notation for a floating
672 point number (e.g. @samp{7.85200e+03}).
673
674 @item %f
675 Replace the specification with the decimal-point notation for a floating
676 point number.
677
678 @item %g
679 Replace the specification with notation for a floating point number,
680 using a ``pretty format''. Either exponential notation or decimal-point
681 notation will be used (usually whichever is shorter), and trailing
682 zeroes are removed from the fractional part.
683
684 @item %%
685 A single @samp{%} is placed in the string. This format specification is
686 unusual in that it does not use a value. For example, @code{(format "%%
687 %d" 30)} returns @code{"% 30"}.
688 @end table
689
690 Any other format character results in an @samp{Invalid format
691 operation} error.
692
693 Here are several examples:
694
695 @example
696 @group
697 (format "The name of this buffer is %s." (buffer-name))
698 @result{} "The name of this buffer is strings.texi."
699
700 (format "The buffer object prints as %s." (current-buffer))
701 @result{} "The buffer object prints as #<buffer strings.texi>."
702
703 (format "The octal value of %d is %o,
704 and the hex value is %x." 18 18 18)
705 @result{} "The octal value of 18 is 22,
706 and the hex value is 12."
707 @end group
708 @end example
709
710 There are many additional flags and specifications that can occur
711 between the @samp{%} and the format character, in the following order:
712
713 @enumerate
714 @item
715 An optional repositioning specification, which is a positive
716 integer followed by a @samp{$}.
717
718 @item
719 Zero or more of the optional flag characters @samp{-}, @samp{+},
720 @samp{ }, @samp{0}, and @samp{#}.
721
722 @item
723 An optional minimum field width.
724
725 @item
726 An optional precision, preceded by a @samp{.} character.
727 @end enumerate
728
729 @cindex repositioning format arguments
730 @cindex multilingual string formatting
731 A @dfn{repositioning} specification changes which argument to
732 @code{format} is used by the current and all following format
733 specifications. Normally the first specification uses the first
734 argument, the second specification uses the second argument, etc. Using
735 a repositioning specification, you can change this. By placing a number
736 @var{N} followed by a @samp{$} between the @samp{%} and the format
737 character, you cause the specification to use the @var{N}th argument.
738 The next specification will use the @var{N}+1'th argument, etc.
739
740 For example:
741
742 @example
743 @group
744 (format "Can't find file `%s' in directory `%s'."
745 "ignatius.c" "loyola/")
746 @result{} "Can't find file `ignatius.c' in directory `loyola/'."
747
748 (format "In directory `%2$s', the file `%1$s' was not found."
749 "ignatius.c" "loyola/")
750 @result{} "In directory `loyola/', the file `ignatius.c' was not found."
751
752 (format
753 "The numbers %d and %d are %1$x and %x in hex and %1$o and %o in octal."
754 37 12)
755 @result{} "The numbers 37 and 12 are 25 and c in hex and 45 and 14 in octal."
756 @end group
757 @end example
758
759 As you can see, this lets you reprocess arguments more than once or
760 reword a format specification (thereby moving the arguments around)
761 without having to actually reorder the arguments. This is especially
762 useful in translating messages from one language to another: Different
763 languages use different word orders, and this sometimes entails changing
764 the order of the arguments. By using repositioning specifications,
765 this can be accomplished without having to embed knowledge of particular
766 languages into the location in the program's code where the message is
767 displayed.
768
769 @cindex numeric prefix
770 @cindex field width
771 @cindex padding
772 All the specification characters allow an optional numeric prefix
773 between the @samp{%} and the character, and following any repositioning
774 specification or flag. The optional numeric prefix defines the minimum
775 width for the object. If the printed representation of the object
776 contains fewer characters than this, then it is padded. The padding is
777 normally on the left, but will be on the right if the @samp{-} flag
778 character is given. The padding character is normally a space, but if
779 the @samp{0} flag character is given, zeros are used for padding.
780
781 @example
782 (format "%06d is padded on the left with zeros" 123)
783 @result{} "000123 is padded on the left with zeros"
784
785 (format "%-6d is padded on the right" 123)
786 @result{} "123 is padded on the right"
787 @end example
788
789 @code{format} never truncates an object's printed representation, no
790 matter what width you specify. Thus, you can use a numeric prefix to
791 specify a minimum spacing between columns with no risk of losing
792 information.
793
794 In the following three examples, @samp{%7s} specifies a minimum width
795 of 7. In the first case, the string inserted in place of @samp{%7s} has
796 only 3 letters, so 4 blank spaces are inserted for padding. In the
797 second case, the string @code{"specification"} is 13 letters wide but is
798 not truncated. In the third case, the padding is on the right.
799
800 @smallexample
801 @group
802 (format "The word `%7s' actually has %d letters in it."
803 "foo" (length "foo"))
804 @result{} "The word ` foo' actually has 3 letters in it."
805 @end group
806
807 @group
808 (format "The word `%7s' actually has %d letters in it."
809 "specification" (length "specification"))
810 @result{} "The word `specification' actually has 13 letters in it."
811 @end group
812
813 @group
814 (format "The word `%-7s' actually has %d letters in it."
815 "foo" (length "foo"))
816 @result{} "The word `foo ' actually has 3 letters in it."
817 @end group
818 @end smallexample
819
820 @cindex format precision
821 @cindex precision of formatted numbers
822 After any minimum field width, a precision may be specified by
823 preceding it with a @samp{.} character. The precision specifies the
824 minimum number of digits to appear in @samp{%d}, @samp{%i}, @samp{%o},
825 @samp{%x}, and @samp{%X} conversions (the number is padded on the left
826 with zeroes as necessary); the number of digits printed after the
827 decimal point for @samp{%f}, @samp{%e}, and @samp{%E} conversions; the
828 number of significant digits printed in @samp{%g} and @samp{%G}
829 conversions; and the maximum number of non-padding characters printed in
830 @samp{%s} and @samp{%S} conversions. The default precision for
831 floating-point conversions is six.
832
833 The other flag characters have the following meanings:
834
835 @itemize @bullet
836 @item
837 The @samp{ } flag means prefix non-negative numbers with a space.
838
839 @item
840 The @samp{+} flag means prefix non-negative numbers with a plus sign.
841
842 @item
843 The @samp{#} flag means print numbers in an alternate, more verbose
844 format: octal numbers begin with zero; hex numbers begin with a
845 @samp{0x} or @samp{0X}; a decimal point is printed in @samp{%f},
846 @samp{%e}, and @samp{%E} conversions even if no numbers are printed
847 after it; and trailing zeroes are not omitted in @samp{%g} and @samp{%G}
848 conversions.
849 @end itemize
850
851 @node Character Case
852 @section Character Case
853 @cindex upper case
854 @cindex lower case
855 @cindex character case
856
857 The character case functions change the case of single characters or
858 of the contents of strings. The functions convert only alphabetic
859 characters (the letters @samp{A} through @samp{Z} and @samp{a} through
860 @samp{z}); other characters are not altered. The functions do not
861 modify the strings that are passed to them as arguments.
862
863 The examples below use the characters @samp{X} and @samp{x} which have
864 @sc{ASCII} codes 88 and 120 respectively.
865
866 @defun downcase string-or-char
867 This function converts a character or a string to lower case.
868
869 When the argument to @code{downcase} is a string, the function creates
870 and returns a new string in which each letter in the argument that is
871 upper case is converted to lower case. When the argument to
872 @code{downcase} is a character, @code{downcase} returns the
873 corresponding lower case character. (This value is actually an integer
874 under XEmacs 19.) If the original character is lower case, or is not a
875 letter, then the value equals the original character.
876
877 @example
878 (downcase "The cat in the hat")
879 @result{} "the cat in the hat"
880
881 (downcase ?X)
882 @result{} ?x ;; @r{Under XEmacs 20.}
883 @result{} 120 ;; @r{Under XEmacs 19.}
884
885 @end example
886 @end defun
887
888 @defun upcase string-or-char
889 This function converts a character or a string to upper case.
890
891 When the argument to @code{upcase} is a string, the function creates
892 and returns a new string in which each letter in the argument that is
893 lower case is converted to upper case.
894
895 When the argument to @code{upcase} is a character, @code{upcase} returns
896 the corresponding upper case character. (This value is actually an
897 integer under XEmacs 19.) If the original character is upper case, or
898 is not a letter, then the value equals the original character.
899
900 @example
901 (upcase "The cat in the hat")
902 @result{} "THE CAT IN THE HAT"
903
904 (upcase ?x)
905 @result{} ?X ;; @r{Under XEmacs 20.}
906 @result{} 88 ;; @r{Under XEmacs 19.}
907 @end example
908 @end defun
909
910 @defun capitalize string-or-char
911 @cindex capitalization
912 This function capitalizes strings or characters. If
913 @var{string-or-char} is a string, the function creates and returns a new
914 string, whose contents are a copy of @var{string-or-char} in which each
915 word has been capitalized. This means that the first character of each
916 word is converted to upper case, and the rest are converted to lower
917 case.
918
919 The definition of a word is any sequence of consecutive characters that
920 are assigned to the word constituent syntax class in the current syntax
921 table (@xref{Syntax Class Table}).
922
923 When the argument to @code{capitalize} is a character, @code{capitalize}
924 has the same result as @code{upcase}.
925
926 @example
927 (capitalize "The cat in the hat")
928 @result{} "The Cat In The Hat"
929
930 (capitalize "THE 77TH-HATTED CAT")
931 @result{} "The 77th-Hatted Cat"
932
933 @group
934 (capitalize ?x)
935 @result{} ?X ;; @r{Under XEmacs 20.}
936 @result{} 88 ;; @r{Under XEmacs 19.}
937 @end group
938 @end example
939 @end defun
940
941 @node Case Tables
942 @section The Case Table
943
944 You can customize case conversion by installing a special @dfn{case
945 table}. A case table specifies the mapping between upper case and lower
946 case letters. It affects both the string and character case conversion
947 functions (see the previous section) and those that apply to text in the
948 buffer (@pxref{Case Changes}). You need a case table if you are using a
949 language which has letters other than the standard @sc{ASCII} letters.
950
951 A case table is a list of this form:
952
953 @example
954 (@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences})
955 @end example
956
957 @noindent
958 where each element is either @code{nil} or a string of length 256. The
959 element @var{downcase} says how to map each character to its lower-case
960 equivalent. The element @var{upcase} maps each character to its
961 upper-case equivalent. If lower and upper case characters are in
962 one-to-one correspondence, use @code{nil} for @var{upcase}; then XEmacs
963 deduces the upcase table from @var{downcase}.
964
965 For some languages, upper and lower case letters are not in one-to-one
966 correspondence. There may be two different lower case letters with the
967 same upper case equivalent. In these cases, you need to specify the
968 maps for both directions.
969
970 The element @var{canonicalize} maps each character to a canonical
971 equivalent; any two characters that are related by case-conversion have
972 the same canonical equivalent character.
973
974 The element @var{equivalences} is a map that cyclicly permutes each
975 equivalence class (of characters with the same canonical equivalent).
976 (For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and
977 @samp{A} into @samp{a}, and likewise for each set of equivalent
978 characters.)
979
980 When you construct a case table, you can provide @code{nil} for
981 @var{canonicalize}; then Emacs fills in this string from @var{upcase}
982 and @var{downcase}. You can also provide @code{nil} for
983 @var{equivalences}; then Emacs fills in this string from
984 @var{canonicalize}. In a case table that is actually in use, those
985 components are non-@code{nil}. Do not try to specify @var{equivalences}
986 without also specifying @var{canonicalize}.
987
988 Each buffer has a case table. XEmacs also has a @dfn{standard case
989 table} which is copied into each buffer when you create the buffer.
990 Changing the standard case table doesn't affect any existing buffers.
991
992 Here are the functions for working with case tables:
993
994 @defun case-table-p object
995 This predicate returns non-@code{nil} if @var{object} is a valid case
996 table.
997 @end defun
998
999 @defun set-standard-case-table table
1000 This function makes @var{table} the standard case table, so that it will
1001 apply to any buffers created subsequently.
1002 @end defun
1003
1004 @defun standard-case-table
1005 This returns the standard case table.
1006 @end defun
1007
1008 @defun current-case-table
1009 This function returns the current buffer's case table.
1010 @end defun
1011
1012 @defun set-case-table table
1013 This sets the current buffer's case table to @var{table}.
1014 @end defun
1015
1016 The following three functions are convenient subroutines for packages
1017 that define non-@sc{ASCII} character sets. They modify a string
1018 @var{downcase-table} provided as an argument; this should be a string to
1019 be used as the @var{downcase} part of a case table. They also modify
1020 the standard syntax table. @xref{Syntax Tables}.
1021
1022 @defun set-case-syntax-pair uc lc downcase-table
1023 This function specifies a pair of corresponding letters, one upper case
1024 and one lower case.
1025 @end defun
1026
1027 @defun set-case-syntax-delims l r downcase-table
1028 This function makes characters @var{l} and @var{r} a matching pair of
1029 case-invariant delimiters.
1030 @end defun
1031
1032 @defun set-case-syntax char syntax downcase-table
1033 This function makes @var{char} case-invariant, with syntax
1034 @var{syntax}.
1035 @end defun
1036
1037 @deffn Command describe-buffer-case-table
1038 This command displays a description of the contents of the current
1039 buffer's case table.
1040 @end deffn
1041
1042 @cindex ISO Latin 1
1043 @pindex iso-syntax
1044 You can load the library @file{iso-syntax} to set up the standard syntax
1045 table and define a case table for the 8-bit ISO Latin 1 character set.
1046
1047 @node Char Tables
1048 @section The Char Table
1049
1050 A char table is a table that maps characters (or ranges of characters)
1051 to values. Char tables are specialized for characters, only allowing
1052 particular sorts of ranges to be assigned values. Although this
1053 loses in generality, it makes for extremely fast (constant-time)
1054 lookups, and thus is feasible for applications that do an extremely
1055 large number of lookups (e.g. scanning a buffer for a character in
1056 a particular syntax, where a lookup in the syntax table must occur
1057 once per character).
1058
1059 Note that char tables as a primitive type, and all of the functions in
1060 this section, exist only in XEmacs 20. In XEmacs 19, char tables are
1061 generally implemented using a vector of 256 elements.
1062
1063 When @sc{MULE} support exists, the types of ranges that can be assigned
1064 values are
1065
1066 @itemize @bullet
1067 @item
1068 all characters
1069 @item
1070 an entire charset
1071 @item
1072 a single row in a two-octet charset
1073 @item
1074 a single character
1075 @end itemize
1076
1077 When @sc{MULE} support is not present, the types of ranges that can be
1078 assigned values are
1079
1080 @itemize @bullet
1081 @item
1082 all characters
1083 @item
1084 a single character
1085 @end itemize
1086
1087 @defun char-table-p object
1088 This function returns non-@code{nil} if @var{object} is a char table.
1089 @end defun
1090
1091 @menu
1092 * Char Table Types:: Char tables have different uses.
1093 * Working With Char Tables:: Creating and working with char tables.
1094 @end menu
1095
1096 @node Char Table Types
1097 @subsection Char Table Types
1098
1099 Each char table type is used for a different purpose and allows different
1100 sorts of values. The different char table types are
1101
1102 @table @code
1103 @item category
1104 Used for category tables, which specify the regexp categories
1105 that a character is in. The valid values are @code{nil} or a
1106 bit vector of 95 elements. Higher-level Lisp functions are
1107 provided for working with category tables. Currently categories
1108 and category tables only exist when @sc{MULE} support is present.
1109 @item char
1110 A generalized char table, for mapping from one character to
1111 another. Used for case tables, syntax matching tables,
1112 @code{keyboard-translate-table}, etc. The valid values are characters.
1113 @item generic
1114 An even more generalized char table, for mapping from a
1115 character to anything.
1116 @item display
1117 Used for display tables, which specify how a particular character
1118 is to appear when displayed. #### Not yet implemented.
1119 @item syntax
1120 Used for syntax tables, which specify the syntax of a particular
1121 character. Higher-level Lisp functions are provided for
1122 working with syntax tables. The valid values are integers.
1123 @end table
1124
1125 @defun char-table-type table
1126 This function returns the type of char table @var{table}.
1127 @end defun
1128
1129 @defun char-table-type-list
1130 This function returns a list of the recognized char table types.
1131 @end defun
1132
1133 @defun valid-char-table-type-p type
1134 This function returns @code{t} if @var{type} if a recognized char table type.
1135 @end defun
1136
1137 @node Working With Char Tables
1138 @subsection Working With Char Tables
1139
1140 @defun make-char-table type
1141 This function makes a new, empty char table of type @var{type}.
1142 @var{type} should be a symbol, one of @code{char}, @code{category},
1143 @code{display}, @code{generic}, or @code{syntax}.
1144 @end defun
1145
1146 @defun put-char-table range val table
1147 This function sets the value for chars in @var{range} to be @var{val} in
1148 @var{table}.
1149
1150 @var{range} specifies one or more characters to be affected and should be
1151 one of the following:
1152
1153 @itemize @bullet
1154 @item
1155 @code{t} (all characters are affected)
1156 @item
1157 A charset (only allowed when @sc{MULE} support is present)
1158 @item
1159 A vector of two elements: a two-octet charset and a row number
1160 (only allowed when @sc{MULE} support is present)
1161 @item
1162 A single character
1163 @end itemize
1164
1165 @var{val} must be a value appropriate for the type of @var{table}.
1166 @end defun
1167
1168 @defun get-char-table ch table
1169 This function finds the value for char @var{ch} in @var{table}.
1170 @end defun
1171
1172 @defun get-range-char-table range table &optional multi
1173 This function finds the value for a range in @var{table}. If there is
1174 more than one value, @var{multi} is returned (defaults to @code{nil}).
1175 @end defun
1176
1177 @defun reset-char-table table
1178 This function resets a char table to its default state.
1179 @end defun
1180
1181 @defun map-char-table function table &optional range
1182 This function maps @var{function} over entries in @var{table}, calling
1183 it with two args, each key and value in the table.
1184
1185 @var{range} specifies a subrange to map over and is in the same format
1186 as the @var{range} argument to @code{put-range-table}. If omitted or
1187 @code{t}, it defaults to the entire table.
1188 @end defun
1189
1190 @defun valid-char-table-value-p value char-table-type
1191 This function returns non-@code{nil} if @var{value} is a valid value for
1192 @var{char-table-type}.
1193 @end defun
1194
1195 @defun check-valid-char-table-value value char-table-type
1196 This function signals an error if @var{value} is not a valid value for
1197 @var{char-table-type}.
1198 @end defun