comparison src/insdel.c @ 771:943eaba38521

[xemacs-hg @ 2002-03-13 08:51:24 by ben] The big ben-mule-21-5 check-in! Various files were added and deleted. See CHANGES-ben-mule. There are still some test suite failures. No crashes, though. Many of the failures have to do with problems in the test suite itself rather than in the actual code. I'll be addressing these in the next day or so -- none of the test suite failures are at all critical. Meanwhile I'll be trying to address the biggest issues -- i.e. build or run failures, which will almost certainly happen on various platforms. All comments should be sent to ben@xemacs.org -- use a Cc: if necessary when sending to mailing lists. There will be pre- and post- tags, something like pre-ben-mule-21-5-merge-in, and post-ben-mule-21-5-merge-in.
author ben
date Wed, 13 Mar 2002 08:54:06 +0000
parents 760db937b9ee
children e38acbeb1cae
comparison
equal deleted inserted replaced
770:336a418893b5 771:943eaba38521
1 /* Buffer insertion/deletion and gap motion for XEmacs. 1 /* Buffer insertion/deletion and gap motion for XEmacs.
2 Copyright (C) 1985, 1986, 1991, 1992, 1993, 1994, 1995 2 Copyright (C) 1985, 1986, 1991, 1992, 1993, 1994, 1995
3 Free Software Foundation, Inc. 3 Free Software Foundation, Inc.
4 Copyright (C) 1995 Sun Microsystems, Inc. 4 Copyright (C) 1995 Sun Microsystems, Inc.
5 Copyright (C) 2001 Ben Wing.
5 6
6 This file is part of XEmacs. 7 This file is part of XEmacs.
7 8
8 XEmacs is free software; you can redistribute it and/or modify it 9 XEmacs is free software; you can redistribute it and/or modify it
9 under the terms of the GNU General Public License as published by the 10 under the terms of the GNU General Public License as published by the
23 /* Synched up with: Mule 2.0, FSF 19.30. Diverges significantly. */ 24 /* Synched up with: Mule 2.0, FSF 19.30. Diverges significantly. */
24 25
25 /* This file has been Mule-ized. */ 26 /* This file has been Mule-ized. */
26 27
27 /* Overhauled by Ben Wing, December 1994, for Mule implementation. */ 28 /* Overhauled by Ben Wing, December 1994, for Mule implementation. */
28
29 /*
30 There are three possible ways to specify positions in a buffer. All
31 of these are one-based: the beginning of the buffer is position or
32 index 1, and 0 is not a valid position.
33
34 As a "buffer position" (typedef Charbpos):
35
36 This is an index specifying an offset in characters from the
37 beginning of the buffer. Note that buffer positions are
38 logically *between* characters, not on a character. The
39 difference between two buffer positions specifies the number of
40 characters between those positions. Buffer positions are the
41 only kind of position externally visible to the user.
42
43 As a "byte index" (typedef Bytebpos):
44
45 This is an index over the bytes used to represent the characters
46 in the buffer. If there is no Mule support, this is identical
47 to a buffer position, because each character is represented
48 using one byte. However, with Mule support, many characters
49 require two or more bytes for their representation, and so a
50 byte index may be greater than the corresponding buffer
51 position.
52
53 As a "memory index" (typedef Membpos):
54
55 This is the byte index adjusted for the gap. For positions
56 before the gap, this is identical to the byte index. For
57 positions after the gap, this is the byte index plus the gap
58 size. There are two possible memory indices for the gap
59 position; the memory index at the beginning of the gap should
60 always be used, except in code that deals with manipulating the
61 gap, where both indices may be seen. The address of the
62 character "at" (i.e. following) a particular position can be
63 obtained from the formula
64
65 buffer_start_address + memory_index(position) - 1
66
67 except in the case of characters at the gap position.
68
69 Other typedefs:
70 ===============
71
72 Emchar:
73 -------
74 This typedef represents a single Emacs character, which can be
75 ASCII, ISO-8859, or some extended character, as would typically
76 be used for Kanji. Note that the representation of a character
77 as an Emchar is *not* the same as the representation of that
78 same character in a string; thus, you cannot do the standard
79 C trick of passing a pointer to a character to a function that
80 expects a string.
81
82 An Emchar takes up 19 bits of representation and (for code
83 compatibility and such) is compatible with an int. This
84 representation is visible on the Lisp level. The important
85 characteristics of the Emchar representation are
86
87 -- values 0x00 - 0x7f represent ASCII.
88 -- values 0x80 - 0xff represent the right half of ISO-8859-1.
89 -- values 0x100 and up represent all other characters.
90
91 This means that Emchar values are upwardly compatible with
92 the standard 8-bit representation of ASCII/ISO-8859-1.
93
94 Intbyte:
95 --------
96 The data in a buffer or string is logically made up of Intbyte
97 objects, where a Intbyte takes up the same amount of space as a
98 char. (It is declared differently, though, to catch invalid
99 usages.) Strings stored using Intbytes are said to be in
100 "internal format". The important characteristics of internal
101 format are
102
103 -- ASCII characters are represented as a single Intbyte,
104 in the range 0 - 0x7f.
105 -- All other characters are represented as a Intbyte in
106 the range 0x80 - 0x9f followed by one or more Intbytes
107 in the range 0xa0 to 0xff.
108
109 This leads to a number of desirable properties:
110
111 -- Given the position of the beginning of a character,
112 you can find the beginning of the next or previous
113 character in constant time.
114 -- When searching for a substring or an ASCII character
115 within the string, you need merely use standard
116 searching routines.
117
118 array of char:
119 --------------
120 Strings that go in or out of Emacs are in "external format",
121 typedef'ed as an array of char or a char *. There is more
122 than one external format (JIS, EUC, etc.) but they all
123 have similar properties. They are modal encodings,
124 which is to say that the meaning of particular bytes is
125 not fixed but depends on what "mode" the string is currently
126 in (e.g. bytes in the range 0 - 0x7f might be
127 interpreted as ASCII, or as Hiragana, or as 2-byte Kanji,
128 depending on the current mode). The mode starts out in
129 ASCII/ISO-8859-1 and is switched using escape sequences --
130 for example, in the JIS encoding, 'ESC $ B' switches to a
131 mode where pairs of bytes in the range 0 - 0x7f
132 are interpreted as Kanji characters.
133
134 External-formatted data is generally desirable for passing
135 data between programs because it is upwardly compatible
136 with standard ASCII/ISO-8859-1 strings and may require
137 less space than internal encodings such as the one
138 described above. In addition, some encodings (e.g. JIS)
139 keep all characters (except the ESC used to switch modes)
140 in the printing ASCII range 0x20 - 0x7e, which results in
141 a much higher probability that the data will avoid being
142 garbled in transmission. Externally-formatted data is
143 generally not very convenient to work with, however, and
144 for this reason is usually converted to internal format
145 before any work is done on the string.
146
147 NOTE: filenames need to be in external format so that
148 ISO-8859-1 characters come out correctly.
149
150 Charcount:
151 ----------
152 This typedef represents a count of characters, such as
153 a character offset into a string or the number of
154 characters between two positions in a buffer. The
155 difference between two Charbpos's is a Charcount, and
156 character positions in a string are represented using
157 a Charcount.
158
159 Bytecount:
160 ----------
161 Similar to a Charcount but represents a count of bytes.
162 The difference between two Bytebpos's is a Bytecount.
163
164
165 Usage of the various representations:
166 =====================================
167
168 Memory indices are used in low-level functions in insdel.c and for
169 extent endpoints and marker positions. The reason for this is that
170 this way, the extents and markers don't need to be updated for most
171 insertions, which merely shrink the gap and don't move any
172 characters around in memory.
173
174 (The beginning-of-gap memory index simplifies insertions w.r.t.
175 markers, because text usually gets inserted after markers. For
176 extents, it is merely for consistency, because text can get
177 inserted either before or after an extent's endpoint depending on
178 the open/closedness of the endpoint.)
179
180 Byte indices are used in other code that needs to be fast,
181 such as the searching, redisplay, and extent-manipulation code.
182
183 Buffer positions are used in all other code. This is because this
184 representation is easiest to work with (especially since Lisp
185 code always uses buffer positions), necessitates the fewest
186 changes to existing code, and is the safest (e.g. if the text gets
187 shifted underneath a buffer position, it will still point to a
188 character; if text is shifted under a byte index, it might point
189 to the middle of a character, which would be bad).
190
191 Similarly, Charcounts are used in all code that deals with strings
192 except for code that needs to be fast, which used Bytecounts.
193
194 Strings are always passed around internally using internal format.
195 Conversions between external format are performed at the time
196 that the data goes in or out of Emacs.
197
198 Working with the various representations:
199 ========================================= */
200 29
201 #include <config.h> 30 #include <config.h>
202 #include "lisp.h" 31 #include "lisp.h"
203 32
204 #include "buffer.h" 33 #include "buffer.h"
207 #include "extents.h" 36 #include "extents.h"
208 #include "insdel.h" 37 #include "insdel.h"
209 #include "lstream.h" 38 #include "lstream.h"
210 #include "redisplay.h" 39 #include "redisplay.h"
211 #include "line-number.h" 40 #include "line-number.h"
212
213 /* We write things this way because it's very important the
214 MAX_BYTEBPOS_GAP_SIZE_3 is a multiple of 3. (As it happens,
215 65535 is a multiple of 3, but this may not always be the
216 case.) */
217
218 #define MAX_CHARBPOS_GAP_SIZE_3 (65535/3)
219 #define MAX_BYTEBPOS_GAP_SIZE_3 (3 * MAX_CHARBPOS_GAP_SIZE_3)
220
221 short three_to_one_table[1 + MAX_BYTEBPOS_GAP_SIZE_3];
222 41
223 /* Various macros modelled along the lines of those in buffer.h. 42 /* Various macros modelled along the lines of those in buffer.h.
224 Purposefully omitted from buffer.h because files other than this 43 Purposefully omitted from buffer.h because files other than this
225 one should not be using them. */ 44 one should not be using them. */
226 45
284 # define SET_END_SENTINEL(buf) 103 # define SET_END_SENTINEL(buf)
285 #endif 104 #endif
286 105
287 106
288 /************************************************************************/ 107 /************************************************************************/
289 /* Charcount/Bytecount conversion */
290 /************************************************************************/
291
292 /* Optimization. Do it. Live it. Love it. */
293
294 #ifdef MULE
295
296 /* We include the basic functions here that require no specific
297 knowledge of how data is Mule-encoded into a buffer other
298 than the basic (00 - 7F), (80 - 9F), (A0 - FF) scheme.
299 Anything that requires more specific knowledge goes into
300 mule-charset.c. */
301
302 /* Given a pointer to a text string and a length in bytes, return
303 the equivalent length in characters. */
304
305 Charcount
306 bytecount_to_charcount (const Intbyte *ptr, Bytecount len)
307 {
308 Charcount count = 0;
309 const Intbyte *end = ptr + len;
310
311 #if SIZEOF_LONG == 8
312 # define STRIDE_TYPE long
313 # define HIGH_BIT_MASK 0x8080808080808080UL
314 #elif SIZEOF_LONG_LONG == 8 && !(defined (i386) || defined (__i386__))
315 # define STRIDE_TYPE long long
316 # define HIGH_BIT_MASK 0x8080808080808080ULL
317 #elif SIZEOF_LONG == 4
318 # define STRIDE_TYPE long
319 # define HIGH_BIT_MASK 0x80808080UL
320 #else
321 # error Add support for 128-bit systems here
322 #endif
323
324 #define ALIGN_BITS ((EMACS_UINT) (ALIGNOF (STRIDE_TYPE) - 1))
325 #define ALIGN_MASK (~ ALIGN_BITS)
326 #define ALIGNED(ptr) ((((EMACS_UINT) ptr) & ALIGN_BITS) == 0)
327 #define STRIDE sizeof (STRIDE_TYPE)
328
329 while (ptr < end)
330 {
331 if (BYTE_ASCII_P (*ptr))
332 {
333 /* optimize for long stretches of ASCII */
334 if (! ALIGNED (ptr))
335 ptr++, count++;
336 else
337 {
338 const unsigned STRIDE_TYPE *ascii_end =
339 (const unsigned STRIDE_TYPE *) ptr;
340 /* This loop screams, because we can typically
341 detect ASCII characters 8 at a time. */
342 while ((const Intbyte *) ascii_end + STRIDE <= end
343 && !(*ascii_end & HIGH_BIT_MASK))
344 ascii_end++;
345 if ((Intbyte *) ascii_end == ptr)
346 ptr++, count++;
347 else
348 {
349 count += (Intbyte *) ascii_end - ptr;
350 ptr = (Intbyte *) ascii_end;
351 }
352 }
353 }
354 else
355 {
356 /* optimize for successive characters from the same charset */
357 Intbyte leading_byte = *ptr;
358 Bytecount bytes = REP_BYTES_BY_FIRST_BYTE (leading_byte);
359 while ((ptr < end) && (*ptr == leading_byte))
360 ptr += bytes, count++;
361 }
362 }
363
364 #ifdef ERROR_CHECK_CHARBPOS
365 /* Bomb out if the specified substring ends in the middle
366 of a character. Note that we might have already gotten
367 a core dump above from an invalid reference, but at least
368 we will get no farther than here. */
369 assert (ptr == end);
370 #endif
371
372 return count;
373 }
374
375 /* Given a pointer to a text string and a length in characters, return
376 the equivalent length in bytes. */
377
378 Bytecount
379 charcount_to_bytecount (const Intbyte *ptr, Charcount len)
380 {
381 const Intbyte *newptr = ptr;
382
383 while (len > 0)
384 {
385 INC_CHARPTR (newptr);
386 len--;
387 }
388 return newptr - ptr;
389 }
390
391 /* The next two functions are the actual meat behind the
392 charbpos-to-bytebpos and bytebpos-to-charbpos conversions. Currently
393 the method they use is fairly unsophisticated; see buffer.h.
394
395 Note that charbpos_to_bytebpos_func() is probably the most-called
396 function in all of XEmacs. Therefore, it must be FAST FAST FAST.
397 This is the reason why so much of the code is duplicated.
398
399 Similar considerations apply to bytebpos_to_charbpos_func(), although
400 less so because the function is not called so often.
401
402 #### At some point this should use a more sophisticated method;
403 see buffer.h. */
404
405 static int not_very_random_number;
406
407 Bytebpos
408 charbpos_to_bytebpos_func (struct buffer *buf, Charbpos x)
409 {
410 Charbpos bufmin;
411 Charbpos bufmax;
412 Bytebpos bytmin;
413 Bytebpos bytmax;
414 int size;
415 int forward_p;
416 Bytebpos retval;
417 int diff_so_far;
418 int add_to_cache = 0;
419
420 /* Check for some cached positions, for speed. */
421 if (x == BUF_PT (buf))
422 return BI_BUF_PT (buf);
423 if (x == BUF_ZV (buf))
424 return BI_BUF_ZV (buf);
425 if (x == BUF_BEGV (buf))
426 return BI_BUF_BEGV (buf);
427
428 bufmin = buf->text->mule_bufmin;
429 bufmax = buf->text->mule_bufmax;
430 bytmin = buf->text->mule_bytmin;
431 bytmax = buf->text->mule_bytmax;
432 size = (1 << buf->text->mule_shifter) + !!buf->text->mule_three_p;
433
434 /* The basic idea here is that we shift the "known region" up or down
435 until it overlaps the specified position. We do this by moving
436 the upper bound of the known region up one character at a time,
437 and moving the lower bound of the known region up as necessary
438 when the size of the character just seen changes.
439
440 We optimize this, however, by first shifting the known region to
441 one of the cached points if it's close by. (We don't check BEG or
442 Z, even though they're cached; most of the time these will be the
443 same as BEGV and ZV, and when they're not, they're not likely
444 to be used.) */
445
446 if (x > bufmax)
447 {
448 Charbpos diffmax = x - bufmax;
449 Charbpos diffpt = x - BUF_PT (buf);
450 Charbpos diffzv = BUF_ZV (buf) - x;
451 /* #### This value could stand some more exploration. */
452 Charcount heuristic_hack = (bufmax - bufmin) >> 2;
453
454 /* Check if the position is closer to PT or ZV than to the
455 end of the known region. */
456
457 if (diffpt < 0)
458 diffpt = -diffpt;
459 if (diffzv < 0)
460 diffzv = -diffzv;
461
462 /* But also implement a heuristic that favors the known region
463 over PT or ZV. The reason for this is that switching to
464 PT or ZV will wipe out the knowledge in the known region,
465 which might be annoying if the known region is large and
466 PT or ZV is not that much closer than the end of the known
467 region. */
468
469 diffzv += heuristic_hack;
470 diffpt += heuristic_hack;
471 if (diffpt < diffmax && diffpt <= diffzv)
472 {
473 bufmax = bufmin = BUF_PT (buf);
474 bytmax = bytmin = BI_BUF_PT (buf);
475 /* We set the size to 1 even though it doesn't really
476 matter because the new known region contains no
477 characters. We do this because this is the most
478 likely size of the characters around the new known
479 region, and we avoid potential yuckiness that is
480 done when size == 3. */
481 size = 1;
482 }
483 if (diffzv < diffmax)
484 {
485 bufmax = bufmin = BUF_ZV (buf);
486 bytmax = bytmin = BI_BUF_ZV (buf);
487 size = 1;
488 }
489 }
490 #ifdef ERROR_CHECK_CHARBPOS
491 else if (x >= bufmin)
492 abort ();
493 #endif
494 else
495 {
496 Charbpos diffmin = bufmin - x;
497 Charbpos diffpt = BUF_PT (buf) - x;
498 Charbpos diffbegv = x - BUF_BEGV (buf);
499 /* #### This value could stand some more exploration. */
500 Charcount heuristic_hack = (bufmax - bufmin) >> 2;
501
502 if (diffpt < 0)
503 diffpt = -diffpt;
504 if (diffbegv < 0)
505 diffbegv = -diffbegv;
506
507 /* But also implement a heuristic that favors the known region --
508 see above. */
509
510 diffbegv += heuristic_hack;
511 diffpt += heuristic_hack;
512
513 if (diffpt < diffmin && diffpt <= diffbegv)
514 {
515 bufmax = bufmin = BUF_PT (buf);
516 bytmax = bytmin = BI_BUF_PT (buf);
517 /* We set the size to 1 even though it doesn't really
518 matter because the new known region contains no
519 characters. We do this because this is the most
520 likely size of the characters around the new known
521 region, and we avoid potential yuckiness that is
522 done when size == 3. */
523 size = 1;
524 }
525 if (diffbegv < diffmin)
526 {
527 bufmax = bufmin = BUF_BEGV (buf);
528 bytmax = bytmin = BI_BUF_BEGV (buf);
529 size = 1;
530 }
531 }
532
533 diff_so_far = x > bufmax ? x - bufmax : bufmin - x;
534 if (diff_so_far > 50)
535 {
536 /* If we have to move more than a certain amount, then look
537 into our cache. */
538 int minval = INT_MAX;
539 int found = 0;
540 int i;
541
542 add_to_cache = 1;
543 /* I considered keeping the positions ordered. This would speed
544 up this loop, but updating the cache would take longer, so
545 it doesn't seem like it would really matter. */
546 for (i = 0; i < 16; i++)
547 {
548 int diff = buf->text->mule_charbpos_cache[i] - x;
549
550 if (diff < 0)
551 diff = -diff;
552 if (diff < minval)
553 {
554 minval = diff;
555 found = i;
556 }
557 }
558
559 if (minval < diff_so_far)
560 {
561 bufmax = bufmin = buf->text->mule_charbpos_cache[found];
562 bytmax = bytmin = buf->text->mule_bytebpos_cache[found];
563 size = 1;
564 }
565 }
566
567 /* It's conceivable that the caching above could lead to X being
568 the same as one of the range edges. */
569 if (x >= bufmax)
570 {
571 Bytebpos newmax;
572 Bytecount newsize;
573
574 forward_p = 1;
575 while (x > bufmax)
576 {
577 newmax = bytmax;
578
579 INC_BYTEBPOS (buf, newmax);
580 newsize = newmax - bytmax;
581 if (newsize != size)
582 {
583 bufmin = bufmax;
584 bytmin = bytmax;
585 size = newsize;
586 }
587 bytmax = newmax;
588 bufmax++;
589 }
590 retval = bytmax;
591
592 /* #### Should go past the found location to reduce the number
593 of times that this function is called */
594 }
595 else /* x < bufmin */
596 {
597 Bytebpos newmin;
598 Bytecount newsize;
599
600 forward_p = 0;
601 while (x < bufmin)
602 {
603 newmin = bytmin;
604
605 DEC_BYTEBPOS (buf, newmin);
606 newsize = bytmin - newmin;
607 if (newsize != size)
608 {
609 bufmax = bufmin;
610 bytmax = bytmin;
611 size = newsize;
612 }
613 bytmin = newmin;
614 bufmin--;
615 }
616 retval = bytmin;
617
618 /* #### Should go past the found location to reduce the number
619 of times that this function is called
620 */
621 }
622
623 /* If size is three, than we have to max sure that the range we
624 discovered isn't too large, because we use a fixed-length
625 table to divide by 3. */
626
627 if (size == 3)
628 {
629 int gap = bytmax - bytmin;
630 buf->text->mule_three_p = 1;
631 buf->text->mule_shifter = 1;
632
633 if (gap > MAX_BYTEBPOS_GAP_SIZE_3)
634 {
635 if (forward_p)
636 {
637 bytmin = bytmax - MAX_BYTEBPOS_GAP_SIZE_3;
638 bufmin = bufmax - MAX_CHARBPOS_GAP_SIZE_3;
639 }
640 else
641 {
642 bytmax = bytmin + MAX_BYTEBPOS_GAP_SIZE_3;
643 bufmax = bufmin + MAX_CHARBPOS_GAP_SIZE_3;
644 }
645 }
646 }
647 else
648 {
649 buf->text->mule_three_p = 0;
650 if (size == 4)
651 buf->text->mule_shifter = 2;
652 else
653 buf->text->mule_shifter = size - 1;
654 }
655
656 buf->text->mule_bufmin = bufmin;
657 buf->text->mule_bufmax = bufmax;
658 buf->text->mule_bytmin = bytmin;
659 buf->text->mule_bytmax = bytmax;
660
661 if (add_to_cache)
662 {
663 int replace_loc;
664
665 /* We throw away a "random" cached value and replace it with
666 the new value. It doesn't actually have to be very random
667 at all, just evenly distributed.
668
669 #### It would be better to use a least-recently-used algorithm
670 or something that tries to space things out, but I'm not sure
671 it's worth it to go to the trouble of maintaining that. */
672 not_very_random_number += 621;
673 replace_loc = not_very_random_number & 15;
674 buf->text->mule_charbpos_cache[replace_loc] = x;
675 buf->text->mule_bytebpos_cache[replace_loc] = retval;
676 }
677
678 return retval;
679 }
680
681 /* The logic in this function is almost identical to the logic in
682 the previous function. */
683
684 Charbpos
685 bytebpos_to_charbpos_func (struct buffer *buf, Bytebpos x)
686 {
687 Charbpos bufmin;
688 Charbpos bufmax;
689 Bytebpos bytmin;
690 Bytebpos bytmax;
691 int size;
692 int forward_p;
693 Charbpos retval;
694 int diff_so_far;
695 int add_to_cache = 0;
696
697 /* Check for some cached positions, for speed. */
698 if (x == BI_BUF_PT (buf))
699 return BUF_PT (buf);
700 if (x == BI_BUF_ZV (buf))
701 return BUF_ZV (buf);
702 if (x == BI_BUF_BEGV (buf))
703 return BUF_BEGV (buf);
704
705 bufmin = buf->text->mule_bufmin;
706 bufmax = buf->text->mule_bufmax;
707 bytmin = buf->text->mule_bytmin;
708 bytmax = buf->text->mule_bytmax;
709 size = (1 << buf->text->mule_shifter) + !!buf->text->mule_three_p;
710
711 /* The basic idea here is that we shift the "known region" up or down
712 until it overlaps the specified position. We do this by moving
713 the upper bound of the known region up one character at a time,
714 and moving the lower bound of the known region up as necessary
715 when the size of the character just seen changes.
716
717 We optimize this, however, by first shifting the known region to
718 one of the cached points if it's close by. (We don't check BI_BEG or
719 BI_Z, even though they're cached; most of the time these will be the
720 same as BI_BEGV and BI_ZV, and when they're not, they're not likely
721 to be used.) */
722
723 if (x > bytmax)
724 {
725 Bytebpos diffmax = x - bytmax;
726 Bytebpos diffpt = x - BI_BUF_PT (buf);
727 Bytebpos diffzv = BI_BUF_ZV (buf) - x;
728 /* #### This value could stand some more exploration. */
729 Bytecount heuristic_hack = (bytmax - bytmin) >> 2;
730
731 /* Check if the position is closer to PT or ZV than to the
732 end of the known region. */
733
734 if (diffpt < 0)
735 diffpt = -diffpt;
736 if (diffzv < 0)
737 diffzv = -diffzv;
738
739 /* But also implement a heuristic that favors the known region
740 over BI_PT or BI_ZV. The reason for this is that switching to
741 BI_PT or BI_ZV will wipe out the knowledge in the known region,
742 which might be annoying if the known region is large and
743 BI_PT or BI_ZV is not that much closer than the end of the known
744 region. */
745
746 diffzv += heuristic_hack;
747 diffpt += heuristic_hack;
748 if (diffpt < diffmax && diffpt <= diffzv)
749 {
750 bufmax = bufmin = BUF_PT (buf);
751 bytmax = bytmin = BI_BUF_PT (buf);
752 /* We set the size to 1 even though it doesn't really
753 matter because the new known region contains no
754 characters. We do this because this is the most
755 likely size of the characters around the new known
756 region, and we avoid potential yuckiness that is
757 done when size == 3. */
758 size = 1;
759 }
760 if (diffzv < diffmax)
761 {
762 bufmax = bufmin = BUF_ZV (buf);
763 bytmax = bytmin = BI_BUF_ZV (buf);
764 size = 1;
765 }
766 }
767 #ifdef ERROR_CHECK_CHARBPOS
768 else if (x >= bytmin)
769 abort ();
770 #endif
771 else
772 {
773 Bytebpos diffmin = bytmin - x;
774 Bytebpos diffpt = BI_BUF_PT (buf) - x;
775 Bytebpos diffbegv = x - BI_BUF_BEGV (buf);
776 /* #### This value could stand some more exploration. */
777 Bytecount heuristic_hack = (bytmax - bytmin) >> 2;
778
779 if (diffpt < 0)
780 diffpt = -diffpt;
781 if (diffbegv < 0)
782 diffbegv = -diffbegv;
783
784 /* But also implement a heuristic that favors the known region --
785 see above. */
786
787 diffbegv += heuristic_hack;
788 diffpt += heuristic_hack;
789
790 if (diffpt < diffmin && diffpt <= diffbegv)
791 {
792 bufmax = bufmin = BUF_PT (buf);
793 bytmax = bytmin = BI_BUF_PT (buf);
794 /* We set the size to 1 even though it doesn't really
795 matter because the new known region contains no
796 characters. We do this because this is the most
797 likely size of the characters around the new known
798 region, and we avoid potential yuckiness that is
799 done when size == 3. */
800 size = 1;
801 }
802 if (diffbegv < diffmin)
803 {
804 bufmax = bufmin = BUF_BEGV (buf);
805 bytmax = bytmin = BI_BUF_BEGV (buf);
806 size = 1;
807 }
808 }
809
810 diff_so_far = x > bytmax ? x - bytmax : bytmin - x;
811 if (diff_so_far > 50)
812 {
813 /* If we have to move more than a certain amount, then look
814 into our cache. */
815 int minval = INT_MAX;
816 int found = 0;
817 int i;
818
819 add_to_cache = 1;
820 /* I considered keeping the positions ordered. This would speed
821 up this loop, but updating the cache would take longer, so
822 it doesn't seem like it would really matter. */
823 for (i = 0; i < 16; i++)
824 {
825 int diff = buf->text->mule_bytebpos_cache[i] - x;
826
827 if (diff < 0)
828 diff = -diff;
829 if (diff < minval)
830 {
831 minval = diff;
832 found = i;
833 }
834 }
835
836 if (minval < diff_so_far)
837 {
838 bufmax = bufmin = buf->text->mule_charbpos_cache[found];
839 bytmax = bytmin = buf->text->mule_bytebpos_cache[found];
840 size = 1;
841 }
842 }
843
844 /* It's conceivable that the caching above could lead to X being
845 the same as one of the range edges. */
846 if (x >= bytmax)
847 {
848 Bytebpos newmax;
849 Bytecount newsize;
850
851 forward_p = 1;
852 while (x > bytmax)
853 {
854 newmax = bytmax;
855
856 INC_BYTEBPOS (buf, newmax);
857 newsize = newmax - bytmax;
858 if (newsize != size)
859 {
860 bufmin = bufmax;
861 bytmin = bytmax;
862 size = newsize;
863 }
864 bytmax = newmax;
865 bufmax++;
866 }
867 retval = bufmax;
868
869 /* #### Should go past the found location to reduce the number
870 of times that this function is called */
871 }
872 else /* x <= bytmin */
873 {
874 Bytebpos newmin;
875 Bytecount newsize;
876
877 forward_p = 0;
878 while (x < bytmin)
879 {
880 newmin = bytmin;
881
882 DEC_BYTEBPOS (buf, newmin);
883 newsize = bytmin - newmin;
884 if (newsize != size)
885 {
886 bufmax = bufmin;
887 bytmax = bytmin;
888 size = newsize;
889 }
890 bytmin = newmin;
891 bufmin--;
892 }
893 retval = bufmin;
894
895 /* #### Should go past the found location to reduce the number
896 of times that this function is called
897 */
898 }
899
900 /* If size is three, than we have to max sure that the range we
901 discovered isn't too large, because we use a fixed-length
902 table to divide by 3. */
903
904 if (size == 3)
905 {
906 int gap = bytmax - bytmin;
907 buf->text->mule_three_p = 1;
908 buf->text->mule_shifter = 1;
909
910 if (gap > MAX_BYTEBPOS_GAP_SIZE_3)
911 {
912 if (forward_p)
913 {
914 bytmin = bytmax - MAX_BYTEBPOS_GAP_SIZE_3;
915 bufmin = bufmax - MAX_CHARBPOS_GAP_SIZE_3;
916 }
917 else
918 {
919 bytmax = bytmin + MAX_BYTEBPOS_GAP_SIZE_3;
920 bufmax = bufmin + MAX_CHARBPOS_GAP_SIZE_3;
921 }
922 }
923 }
924 else
925 {
926 buf->text->mule_three_p = 0;
927 if (size == 4)
928 buf->text->mule_shifter = 2;
929 else
930 buf->text->mule_shifter = size - 1;
931 }
932
933 buf->text->mule_bufmin = bufmin;
934 buf->text->mule_bufmax = bufmax;
935 buf->text->mule_bytmin = bytmin;
936 buf->text->mule_bytmax = bytmax;
937
938 if (add_to_cache)
939 {
940 int replace_loc;
941
942 /* We throw away a "random" cached value and replace it with
943 the new value. It doesn't actually have to be very random
944 at all, just evenly distributed.
945
946 #### It would be better to use a least-recently-used algorithm
947 or something that tries to space things out, but I'm not sure
948 it's worth it to go to the trouble of maintaining that. */
949 not_very_random_number += 621;
950 replace_loc = not_very_random_number & 15;
951 buf->text->mule_charbpos_cache[replace_loc] = retval;
952 buf->text->mule_bytebpos_cache[replace_loc] = x;
953 }
954
955 return retval;
956 }
957
958 /* Text of length BYTELENGTH and CHARLENGTH (in different units)
959 was inserted at charbpos START. */
960
961 static void
962 buffer_mule_signal_inserted_region (struct buffer *buf, Charbpos start,
963 Bytecount bytelength,
964 Charcount charlength)
965 {
966 int size = (1 << buf->text->mule_shifter) + !!buf->text->mule_three_p;
967 int i;
968
969 /* Adjust the cache of known positions. */
970 for (i = 0; i < 16; i++)
971 {
972
973 if (buf->text->mule_charbpos_cache[i] > start)
974 {
975 buf->text->mule_charbpos_cache[i] += charlength;
976 buf->text->mule_bytebpos_cache[i] += bytelength;
977 }
978 }
979
980 if (start >= buf->text->mule_bufmax)
981 return;
982
983 /* The insertion is either before the known region, in which case
984 it shoves it forward; or within the known region, in which case
985 it shoves the end forward. (But it may make the known region
986 inconsistent, so we may have to shorten it.) */
987
988 if (start <= buf->text->mule_bufmin)
989 {
990 buf->text->mule_bufmin += charlength;
991 buf->text->mule_bufmax += charlength;
992 buf->text->mule_bytmin += bytelength;
993 buf->text->mule_bytmax += bytelength;
994 }
995 else
996 {
997 Charbpos end = start + charlength;
998 /* the insertion point divides the known region in two.
999 Keep the longer half, at least, and expand into the
1000 inserted chunk as much as possible. */
1001
1002 if (start - buf->text->mule_bufmin > buf->text->mule_bufmax - start)
1003 {
1004 Bytebpos bytestart = (buf->text->mule_bytmin
1005 + size * (start - buf->text->mule_bufmin));
1006 Bytebpos bytenew;
1007
1008 while (start < end)
1009 {
1010 bytenew = bytestart;
1011 INC_BYTEBPOS (buf, bytenew);
1012 if (bytenew - bytestart != size)
1013 break;
1014 start++;
1015 bytestart = bytenew;
1016 }
1017 if (start != end)
1018 {
1019 buf->text->mule_bufmax = start;
1020 buf->text->mule_bytmax = bytestart;
1021 }
1022 else
1023 {
1024 buf->text->mule_bufmax += charlength;
1025 buf->text->mule_bytmax += bytelength;
1026 }
1027 }
1028 else
1029 {
1030 Bytebpos byteend = (buf->text->mule_bytmin
1031 + size * (start - buf->text->mule_bufmin)
1032 + bytelength);
1033 Bytebpos bytenew;
1034
1035 buf->text->mule_bufmax += charlength;
1036 buf->text->mule_bytmax += bytelength;
1037
1038 while (end > start)
1039 {
1040 bytenew = byteend;
1041 DEC_BYTEBPOS (buf, bytenew);
1042 if (byteend - bytenew != size)
1043 break;
1044 end--;
1045 byteend = bytenew;
1046 }
1047 if (start != end)
1048 {
1049 buf->text->mule_bufmin = end;
1050 buf->text->mule_bytmin = byteend;
1051 }
1052 }
1053 }
1054 }
1055
1056 /* Text from START to END (equivalent in Bytebposs: from BI_START to
1057 BI_END) was deleted. */
1058
1059 static void
1060 buffer_mule_signal_deleted_region (struct buffer *buf, Charbpos start,
1061 Charbpos end, Bytebpos bi_start,
1062 Bytebpos bi_end)
1063 {
1064 int i;
1065
1066 /* Adjust the cache of known positions. */
1067 for (i = 0; i < 16; i++)
1068 {
1069 /* After the end; gets shoved backward */
1070 if (buf->text->mule_charbpos_cache[i] > end)
1071 {
1072 buf->text->mule_charbpos_cache[i] -= end - start;
1073 buf->text->mule_bytebpos_cache[i] -= bi_end - bi_start;
1074 }
1075 /* In the range; moves to start of range */
1076 else if (buf->text->mule_charbpos_cache[i] > start)
1077 {
1078 buf->text->mule_charbpos_cache[i] = start;
1079 buf->text->mule_bytebpos_cache[i] = bi_start;
1080 }
1081 }
1082
1083 /* We don't care about any text after the end of the known region. */
1084
1085 end = min (end, buf->text->mule_bufmax);
1086 bi_end = min (bi_end, buf->text->mule_bytmax);
1087 if (start >= end)
1088 return;
1089
1090 /* The end of the known region offsets by the total amount of deletion,
1091 since it's all before it. */
1092
1093 buf->text->mule_bufmax -= end - start;
1094 buf->text->mule_bytmax -= bi_end - bi_start;
1095
1096 /* Now we don't care about any text after the start of the known region. */
1097
1098 end = min (end, buf->text->mule_bufmin);
1099 bi_end = min (bi_end, buf->text->mule_bytmin);
1100 if (start >= end)
1101 return;
1102
1103 buf->text->mule_bufmin -= end - start;
1104 buf->text->mule_bytmin -= bi_end - bi_start;
1105 }
1106
1107 #endif /* MULE */
1108
1109 #ifdef ERROR_CHECK_CHARBPOS
1110
1111 Bytebpos
1112 charbpos_to_bytebpos (struct buffer *buf, Charbpos x)
1113 {
1114 Bytebpos retval = real_charbpos_to_bytebpos (buf, x);
1115 ASSERT_VALID_BYTEBPOS_UNSAFE (buf, retval);
1116 return retval;
1117 }
1118
1119 Charbpos
1120 bytebpos_to_charbpos (struct buffer *buf, Bytebpos x)
1121 {
1122 ASSERT_VALID_BYTEBPOS_UNSAFE (buf, x);
1123 return real_bytebpos_to_charbpos (buf, x);
1124 }
1125
1126 #endif /* ERROR_CHECK_CHARBPOS */
1127
1128
1129 /************************************************************************/
1130 /* verifying buffer and string positions */
1131 /************************************************************************/
1132
1133 /* Functions below are tagged with either _byte or _char indicating
1134 whether they return byte or character positions. For a buffer,
1135 a character position is a "Charbpos" and a byte position is a "Bytebpos".
1136 For strings, these are sometimes typed using "Charcount" and
1137 "Bytecount". */
1138
1139 /* Flags for the functions below are:
1140
1141 GB_ALLOW_PAST_ACCESSIBLE
1142
1143 Allow positions to range over the entire buffer (BUF_BEG to BUF_Z),
1144 rather than just the accessible portion (BUF_BEGV to BUF_ZV).
1145 For strings, this flag has no effect.
1146
1147 GB_COERCE_RANGE
1148
1149 If the position is outside the allowable range, return the lower
1150 or upper bound of the range, whichever is closer to the specified
1151 position.
1152
1153 GB_NO_ERROR_IF_BAD
1154
1155 If the position is outside the allowable range, return -1.
1156
1157 GB_NEGATIVE_FROM_END
1158
1159 If a value is negative, treat it as an offset from the end.
1160 Only applies to strings.
1161
1162 The following additional flags apply only to the functions
1163 that return ranges:
1164
1165 GB_ALLOW_NIL
1166
1167 Either or both positions can be nil. If FROM is nil,
1168 FROM_OUT will contain the lower bound of the allowed range.
1169 If TO is nil, TO_OUT will contain the upper bound of the
1170 allowed range.
1171
1172 GB_CHECK_ORDER
1173
1174 FROM must contain the lower bound and TO the upper bound
1175 of the range. If the positions are reversed, an error is
1176 signalled.
1177
1178 The following is a combination flag:
1179
1180 GB_HISTORICAL_STRING_BEHAVIOR
1181
1182 Equivalent to (GB_NEGATIVE_FROM_END | GB_ALLOW_NIL).
1183 */
1184
1185 /* Return a buffer position stored in a Lisp_Object. Full
1186 error-checking is done on the position. Flags can be specified to
1187 control the behavior of out-of-range values. The default behavior
1188 is to require that the position is within the accessible part of
1189 the buffer (BEGV and ZV), and to signal an error if the position is
1190 out of range.
1191
1192 */
1193
1194 Charbpos
1195 get_buffer_pos_char (struct buffer *b, Lisp_Object pos, unsigned int flags)
1196 {
1197 /* Does not GC */
1198 Charbpos ind;
1199 Charbpos min_allowed, max_allowed;
1200
1201 CHECK_INT_COERCE_MARKER (pos);
1202 ind = XINT (pos);
1203 min_allowed = flags & GB_ALLOW_PAST_ACCESSIBLE ? BUF_BEG (b) : BUF_BEGV (b);
1204 max_allowed = flags & GB_ALLOW_PAST_ACCESSIBLE ? BUF_Z (b) : BUF_ZV (b);
1205
1206 if (ind < min_allowed || ind > max_allowed)
1207 {
1208 if (flags & GB_COERCE_RANGE)
1209 ind = ind < min_allowed ? min_allowed : max_allowed;
1210 else if (flags & GB_NO_ERROR_IF_BAD)
1211 ind = -1;
1212 else
1213 {
1214 Lisp_Object buffer;
1215 XSETBUFFER (buffer, b);
1216 args_out_of_range (buffer, pos);
1217 }
1218 }
1219
1220 return ind;
1221 }
1222
1223 Bytebpos
1224 get_buffer_pos_byte (struct buffer *b, Lisp_Object pos, unsigned int flags)
1225 {
1226 Charbpos bpos = get_buffer_pos_char (b, pos, flags);
1227 if (bpos < 0) /* could happen with GB_NO_ERROR_IF_BAD */
1228 return -1;
1229 return charbpos_to_bytebpos (b, bpos);
1230 }
1231
1232 /* Return a pair of buffer positions representing a range of text,
1233 taken from a pair of Lisp_Objects. Full error-checking is
1234 done on the positions. Flags can be specified to control the
1235 behavior of out-of-range values. The default behavior is to
1236 allow the range bounds to be specified in either order
1237 (however, FROM_OUT will always be the lower bound of the range
1238 and TO_OUT the upper bound),to require that the positions
1239 are within the accessible part of the buffer (BEGV and ZV),
1240 and to signal an error if the positions are out of range.
1241 */
1242
1243 void
1244 get_buffer_range_char (struct buffer *b, Lisp_Object from, Lisp_Object to,
1245 Charbpos *from_out, Charbpos *to_out, unsigned int flags)
1246 {
1247 /* Does not GC */
1248 Charbpos min_allowed, max_allowed;
1249
1250 min_allowed = (flags & GB_ALLOW_PAST_ACCESSIBLE) ?
1251 BUF_BEG (b) : BUF_BEGV (b);
1252 max_allowed = (flags & GB_ALLOW_PAST_ACCESSIBLE) ?
1253 BUF_Z (b) : BUF_ZV (b);
1254
1255 if (NILP (from) && (flags & GB_ALLOW_NIL))
1256 *from_out = min_allowed;
1257 else
1258 *from_out = get_buffer_pos_char (b, from, flags | GB_NO_ERROR_IF_BAD);
1259
1260 if (NILP (to) && (flags & GB_ALLOW_NIL))
1261 *to_out = max_allowed;
1262 else
1263 *to_out = get_buffer_pos_char (b, to, flags | GB_NO_ERROR_IF_BAD);
1264
1265 if ((*from_out < 0 || *to_out < 0) && !(flags & GB_NO_ERROR_IF_BAD))
1266 {
1267 Lisp_Object buffer;
1268 XSETBUFFER (buffer, b);
1269 args_out_of_range_3 (buffer, from, to);
1270 }
1271
1272 if (*from_out >= 0 && *to_out >= 0 && *from_out > *to_out)
1273 {
1274 if (flags & GB_CHECK_ORDER)
1275 invalid_argument_2 ("start greater than end", from, to);
1276 else
1277 {
1278 Charbpos temp = *from_out;
1279 *from_out = *to_out;
1280 *to_out = temp;
1281 }
1282 }
1283 }
1284
1285 void
1286 get_buffer_range_byte (struct buffer *b, Lisp_Object from, Lisp_Object to,
1287 Bytebpos *from_out, Bytebpos *to_out, unsigned int flags)
1288 {
1289 Charbpos s, e;
1290
1291 get_buffer_range_char (b, from, to, &s, &e, flags);
1292 if (s >= 0)
1293 *from_out = charbpos_to_bytebpos (b, s);
1294 else /* could happen with GB_NO_ERROR_IF_BAD */
1295 *from_out = -1;
1296 if (e >= 0)
1297 *to_out = charbpos_to_bytebpos (b, e);
1298 else
1299 *to_out = -1;
1300 }
1301
1302 static Charcount
1303 get_string_pos_char_1 (Lisp_Object string, Lisp_Object pos, unsigned int flags,
1304 Charcount known_length)
1305 {
1306 Charcount ccpos;
1307 Charcount min_allowed = 0;
1308 Charcount max_allowed = known_length;
1309
1310 /* Computation of KNOWN_LENGTH is potentially expensive so we pass
1311 it in. */
1312 CHECK_INT (pos);
1313 ccpos = XINT (pos);
1314 if (ccpos < 0 && flags & GB_NEGATIVE_FROM_END)
1315 ccpos += max_allowed;
1316
1317 if (ccpos < min_allowed || ccpos > max_allowed)
1318 {
1319 if (flags & GB_COERCE_RANGE)
1320 ccpos = ccpos < min_allowed ? min_allowed : max_allowed;
1321 else if (flags & GB_NO_ERROR_IF_BAD)
1322 ccpos = -1;
1323 else
1324 args_out_of_range (string, pos);
1325 }
1326
1327 return ccpos;
1328 }
1329
1330 Charcount
1331 get_string_pos_char (Lisp_Object string, Lisp_Object pos, unsigned int flags)
1332 {
1333 return get_string_pos_char_1 (string, pos, flags,
1334 XSTRING_CHAR_LENGTH (string));
1335 }
1336
1337 Bytecount
1338 get_string_pos_byte (Lisp_Object string, Lisp_Object pos, unsigned int flags)
1339 {
1340 Charcount ccpos = get_string_pos_char (string, pos, flags);
1341 if (ccpos < 0) /* could happen with GB_NO_ERROR_IF_BAD */
1342 return -1;
1343 return charcount_to_bytecount (XSTRING_DATA (string), ccpos);
1344 }
1345
1346 void
1347 get_string_range_char (Lisp_Object string, Lisp_Object from, Lisp_Object to,
1348 Charcount *from_out, Charcount *to_out,
1349 unsigned int flags)
1350 {
1351 Charcount min_allowed = 0;
1352 Charcount max_allowed = XSTRING_CHAR_LENGTH (string);
1353
1354 if (NILP (from) && (flags & GB_ALLOW_NIL))
1355 *from_out = min_allowed;
1356 else
1357 *from_out = get_string_pos_char_1 (string, from,
1358 flags | GB_NO_ERROR_IF_BAD,
1359 max_allowed);
1360
1361 if (NILP (to) && (flags & GB_ALLOW_NIL))
1362 *to_out = max_allowed;
1363 else
1364 *to_out = get_string_pos_char_1 (string, to,
1365 flags | GB_NO_ERROR_IF_BAD,
1366 max_allowed);
1367
1368 if ((*from_out < 0 || *to_out < 0) && !(flags & GB_NO_ERROR_IF_BAD))
1369 args_out_of_range_3 (string, from, to);
1370
1371 if (*from_out >= 0 && *to_out >= 0 && *from_out > *to_out)
1372 {
1373 if (flags & GB_CHECK_ORDER)
1374 invalid_argument_2 ("start greater than end", from, to);
1375 else
1376 {
1377 Charbpos temp = *from_out;
1378 *from_out = *to_out;
1379 *to_out = temp;
1380 }
1381 }
1382 }
1383
1384 void
1385 get_string_range_byte (Lisp_Object string, Lisp_Object from, Lisp_Object to,
1386 Bytecount *from_out, Bytecount *to_out,
1387 unsigned int flags)
1388 {
1389 Charcount s, e;
1390
1391 get_string_range_char (string, from, to, &s, &e, flags);
1392 if (s >= 0)
1393 *from_out = charcount_to_bytecount (XSTRING_DATA (string), s);
1394 else /* could happen with GB_NO_ERROR_IF_BAD */
1395 *from_out = -1;
1396 if (e >= 0)
1397 *to_out = charcount_to_bytecount (XSTRING_DATA (string), e);
1398 else
1399 *to_out = -1;
1400
1401 }
1402
1403 Charbpos
1404 get_buffer_or_string_pos_char (Lisp_Object object, Lisp_Object pos,
1405 unsigned int flags)
1406 {
1407 return STRINGP (object) ?
1408 get_string_pos_char (object, pos, flags) :
1409 get_buffer_pos_char (XBUFFER (object), pos, flags);
1410 }
1411
1412 Bytebpos
1413 get_buffer_or_string_pos_byte (Lisp_Object object, Lisp_Object pos,
1414 unsigned int flags)
1415 {
1416 return STRINGP (object) ?
1417 get_string_pos_byte (object, pos, flags) :
1418 get_buffer_pos_byte (XBUFFER (object), pos, flags);
1419 }
1420
1421 void
1422 get_buffer_or_string_range_char (Lisp_Object object, Lisp_Object from,
1423 Lisp_Object to, Charbpos *from_out,
1424 Charbpos *to_out, unsigned int flags)
1425 {
1426 if (STRINGP (object))
1427 get_string_range_char (object, from, to, from_out, to_out, flags);
1428 else
1429 get_buffer_range_char (XBUFFER (object), from, to, from_out, to_out, flags);
1430 }
1431
1432 void
1433 get_buffer_or_string_range_byte (Lisp_Object object, Lisp_Object from,
1434 Lisp_Object to, Bytebpos *from_out,
1435 Bytebpos *to_out, unsigned int flags)
1436 {
1437 if (STRINGP (object))
1438 get_string_range_byte (object, from, to, from_out, to_out, flags);
1439 else
1440 get_buffer_range_byte (XBUFFER (object), from, to, from_out, to_out, flags);
1441 }
1442
1443 Charbpos
1444 buffer_or_string_accessible_begin_char (Lisp_Object object)
1445 {
1446 return STRINGP (object) ? 0 : BUF_BEGV (XBUFFER (object));
1447 }
1448
1449 Charbpos
1450 buffer_or_string_accessible_end_char (Lisp_Object object)
1451 {
1452 return STRINGP (object) ?
1453 XSTRING_CHAR_LENGTH (object) : BUF_ZV (XBUFFER (object));
1454 }
1455
1456 Bytebpos
1457 buffer_or_string_accessible_begin_byte (Lisp_Object object)
1458 {
1459 return STRINGP (object) ? 0 : BI_BUF_BEGV (XBUFFER (object));
1460 }
1461
1462 Bytebpos
1463 buffer_or_string_accessible_end_byte (Lisp_Object object)
1464 {
1465 return STRINGP (object) ?
1466 XSTRING_LENGTH (object) : BI_BUF_ZV (XBUFFER (object));
1467 }
1468
1469 Charbpos
1470 buffer_or_string_absolute_begin_char (Lisp_Object object)
1471 {
1472 return STRINGP (object) ? 0 : BUF_BEG (XBUFFER (object));
1473 }
1474
1475 Charbpos
1476 buffer_or_string_absolute_end_char (Lisp_Object object)
1477 {
1478 return STRINGP (object) ?
1479 XSTRING_CHAR_LENGTH (object) : BUF_Z (XBUFFER (object));
1480 }
1481
1482 Bytebpos
1483 buffer_or_string_absolute_begin_byte (Lisp_Object object)
1484 {
1485 return STRINGP (object) ? 0 : BI_BUF_BEG (XBUFFER (object));
1486 }
1487
1488 Bytebpos
1489 buffer_or_string_absolute_end_byte (Lisp_Object object)
1490 {
1491 return STRINGP (object) ?
1492 XSTRING_LENGTH (object) : BI_BUF_Z (XBUFFER (object));
1493 }
1494
1495
1496 /************************************************************************/
1497 /* point and marker adjustment */ 108 /* point and marker adjustment */
1498 /************************************************************************/ 109 /************************************************************************/
1499 110
1500 /* just_set_point() is the only place `PT' is an lvalue in all of emacs. 111 /* just_set_point() is the only place `PT' is an lvalue in all of emacs.
1501 This function is called from set_buffer_point(), which is the function 112 This function is called from set_buffer_point(), which is the function
1694 { 305 {
1695 adjust_markers (mbuf, pos, BI_BUF_GPT (mbuf), BUF_GAP_SIZE (mbuf)); 306 adjust_markers (mbuf, pos, BI_BUF_GPT (mbuf), BUF_GAP_SIZE (mbuf));
1696 } 307 }
1697 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 308 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
1698 { 309 {
1699 adjust_extents (make_buffer (mbuf), pos, BI_BUF_GPT (mbuf), 310 adjust_extents (wrap_buffer (mbuf), pos, BI_BUF_GPT (mbuf),
1700 BUF_GAP_SIZE (mbuf)); 311 BUF_GAP_SIZE (mbuf));
1701 } 312 }
1702 SET_BI_BUF_GPT (buf, pos); 313 SET_BI_BUF_GPT (buf, pos);
1703 SET_GAP_SENTINEL (buf); 314 SET_GAP_SENTINEL (buf);
1704 #ifdef ERROR_CHECK_EXTENTS 315 #ifdef ERROR_CHECK_EXTENTS
1705 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 316 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
1706 { 317 {
1707 sledgehammer_extent_check (make_buffer (mbuf)); 318 sledgehammer_extent_check (wrap_buffer (mbuf));
1708 } 319 }
1709 #endif 320 #endif
1710 QUIT; 321 QUIT;
1711 } 322 }
1712 323
1764 { 375 {
1765 adjust_markers (mbuf, BI_BUF_GPT (mbuf) + gsize, pos + gsize, - gsize); 376 adjust_markers (mbuf, BI_BUF_GPT (mbuf) + gsize, pos + gsize, - gsize);
1766 } 377 }
1767 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 378 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
1768 { 379 {
1769 adjust_extents (make_buffer (mbuf), BI_BUF_GPT (mbuf) + gsize, 380 adjust_extents (wrap_buffer (mbuf), BI_BUF_GPT (mbuf) + gsize,
1770 pos + gsize, - gsize); 381 pos + gsize, - gsize);
1771 } 382 }
1772 SET_BI_BUF_GPT (buf, pos); 383 SET_BI_BUF_GPT (buf, pos);
1773 SET_GAP_SENTINEL (buf); 384 SET_GAP_SENTINEL (buf);
1774 #ifdef ERROR_CHECK_EXTENTS 385 #ifdef ERROR_CHECK_EXTENTS
1775 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 386 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
1776 { 387 {
1777 sledgehammer_extent_check (make_buffer (mbuf)); 388 sledgehammer_extent_check (wrap_buffer (mbuf));
1778 } 389 }
1779 #endif 390 #endif
1780 } 391 }
1781 if (pos == BI_BUF_Z (buf)) 392 if (pos == BI_BUF_Z (buf))
1782 { 393 {
2069 end_multiple_change (struct buffer *buf, int count) 680 end_multiple_change (struct buffer *buf, int count)
2070 { 681 {
2071 assert (buf->text->changes->in_multiple_change > 0); 682 assert (buf->text->changes->in_multiple_change > 0);
2072 buf->text->changes->in_multiple_change--; 683 buf->text->changes->in_multiple_change--;
2073 if (!buf->text->changes->in_multiple_change) 684 if (!buf->text->changes->in_multiple_change)
2074 unbind_to (count, Qnil); 685 unbind_to (count);
2075 } 686 }
2076 687
2077 static int inside_change_hook; 688 static int inside_change_hook;
2078 689
2079 static Lisp_Object 690 static Lisp_Object
2113 int speccount = specpdl_depth (); 724 int speccount = specpdl_depth ();
2114 record_unwind_protect (first_change_hook_restore, buffer); 725 record_unwind_protect (first_change_hook_restore, buffer);
2115 set_buffer_internal (buf); 726 set_buffer_internal (buf);
2116 in_first_change = 1; 727 in_first_change = 1;
2117 run_hook (Qfirst_change_hook); 728 run_hook (Qfirst_change_hook);
2118 unbind_to (speccount, Qnil); 729 unbind_to (speccount);
2119 } 730 }
2120 } 731 }
2121 } 732 }
2122 733
2123 /* Signal a change to the buffer immediately before it happens. 734 /* Signal a change to the buffer immediately before it happens.
2201 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 812 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2202 { 813 {
2203 XSETBUFFER (buffer, mbuf); 814 XSETBUFFER (buffer, mbuf);
2204 report_extent_modification (buffer, start, end, 0); 815 report_extent_modification (buffer, start, end, 0);
2205 } 816 }
2206 unbind_to (speccount, Qnil); 817 unbind_to (speccount);
2207 818
2208 /* Only now do we indicate that the before-change-functions have 819 /* Only now do we indicate that the before-change-functions have
2209 been called, in case some function throws out. */ 820 been called, in case some function throws out. */
2210 buf->text->changes->mc_begin_signaled = 1; 821 buf->text->changes->mc_begin_signaled = 1;
2211 } 822 }
2289 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 900 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2290 { 901 {
2291 XSETBUFFER (buffer, mbuf); 902 XSETBUFFER (buffer, mbuf);
2292 report_extent_modification (buffer, start, new_end, 1); 903 report_extent_modification (buffer, start, new_end, 1);
2293 } 904 }
2294 unbind_to (speccount, Qnil); /* sets inside_change_hook back to 0 */ 905 unbind_to (speccount); /* sets inside_change_hook back to 0 */
2295 } 906 }
2296 } 907 }
2297 908
2298 /* Call this if you're about to change the region of BUFFER from START 909 /* Call this if you're about to change the region of BUFFER from START
2299 to END. This checks the read-only properties of the region, calls 910 to END. This checks the read-only properties of the region, calls
2467 if (pos < BUF_BEGV (buf)) 1078 if (pos < BUF_BEGV (buf))
2468 pos = BUF_BEGV (buf); 1079 pos = BUF_BEGV (buf);
2469 if (pos > BUF_ZV (buf)) 1080 if (pos > BUF_ZV (buf))
2470 pos = BUF_ZV (buf); 1081 pos = BUF_ZV (buf);
2471 1082
1083 ind = charbpos_to_bytebpos (buf, pos);
1084
2472 /* string may have been relocated up to this point */ 1085 /* string may have been relocated up to this point */
2473 if (STRINGP (reloc)) 1086 if (STRINGP (reloc))
2474 nonreloc = XSTRING_DATA (reloc); 1087 {
2475 1088 cclen = XSTRING_OFFSET_BYTE_TO_CHAR_LEN (reloc, offset, length);
2476 ind = charbpos_to_bytebpos (buf, pos); 1089 nonreloc = XSTRING_DATA (reloc);
2477 cclen = bytecount_to_charcount (nonreloc + offset, length); 1090 }
1091 else
1092 cclen = bytecount_to_charcount (nonreloc + offset, length);
2478 1093
2479 if (ind != BI_BUF_GPT (buf)) 1094 if (ind != BI_BUF_GPT (buf))
2480 /* #### if debug-on-quit is invoked and the user changes the 1095 /* #### if debug-on-quit is invoked and the user changes the
2481 buffer, bad things can happen. This is a rampant problem 1096 buffer, bad things can happen. This is a rampant problem
2482 in Emacs. */ 1097 in Emacs. */
2511 { 1126 {
2512 SET_BOTH_BUF_ZV (mbuf, BUF_ZV (mbuf) + cclen, BI_BUF_ZV (mbuf) + length); 1127 SET_BOTH_BUF_ZV (mbuf, BUF_ZV (mbuf) + cclen, BI_BUF_ZV (mbuf) + length);
2513 } 1128 }
2514 SET_BOTH_BUF_Z (buf, BUF_Z (buf) + cclen, BI_BUF_Z (buf) + length); 1129 SET_BOTH_BUF_Z (buf, BUF_Z (buf) + cclen, BI_BUF_Z (buf) + length);
2515 SET_GAP_SENTINEL (buf); 1130 SET_GAP_SENTINEL (buf);
2516 1131
1132
2517 #ifdef MULE 1133 #ifdef MULE
2518 buffer_mule_signal_inserted_region (buf, pos, length, cclen); 1134 buffer_mule_signal_inserted_region (buf, pos, length, cclen);
2519 #endif 1135 #endif
2520 1136
2521 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1137 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2522 { 1138 {
2523 process_extents_for_insertion (make_buffer (mbuf), ind, length); 1139 process_extents_for_insertion (wrap_buffer (mbuf), ind, length);
2524 } 1140 }
2525 1141
2526 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1142 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2527 { 1143 {
2528 /* We know the gap is at IND so the cast is OK. */ 1144 /* We know the gap is at IND so the cast is OK. */
2736 extents will be present to be recorded, and BEFORE the gap 1352 extents will be present to be recorded, and BEFORE the gap
2737 size is increased, as otherwise we will be confused about 1353 size is increased, as otherwise we will be confused about
2738 where the extents end. */ 1354 where the extents end. */
2739 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1355 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2740 { 1356 {
2741 process_extents_for_deletion (make_buffer (mbuf), bi_from, bi_to, 0); 1357 process_extents_for_deletion (wrap_buffer (mbuf), bi_from, bi_to, 0);
2742 } 1358 }
2743 1359
2744 /* Relocate all markers pointing into the new, larger gap to 1360 /* Relocate all markers pointing into the new, larger gap to
2745 point at the end of the text before the gap. */ 1361 point at the end of the text before the gap. */
2746 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1362 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2752 } 1368 }
2753 1369
2754 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1370 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2755 { 1371 {
2756 /* Relocate any extent endpoints just like markers. */ 1372 /* Relocate any extent endpoints just like markers. */
2757 adjust_extents_for_deletion (make_buffer (mbuf), bi_from, bi_to, 1373 adjust_extents_for_deletion (wrap_buffer (mbuf), bi_from, bi_to,
2758 BUF_GAP_SIZE (mbuf), bc_numdel, 0); 1374 BUF_GAP_SIZE (mbuf), bc_numdel, 0);
2759 } 1375 }
2760 1376
2761 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1377 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2762 { 1378 {
2810 This must come AFTER record_delete(), so that the appropriate extents 1426 This must come AFTER record_delete(), so that the appropriate extents
2811 will be present to be recorded, and BEFORE the gap size is increased, 1427 will be present to be recorded, and BEFORE the gap size is increased,
2812 as otherwise we will be confused about where the extents end. */ 1428 as otherwise we will be confused about where the extents end. */
2813 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1429 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2814 { 1430 {
2815 process_extents_for_deletion (make_buffer (mbuf), bi_from, bi_to, 0); 1431 process_extents_for_deletion (wrap_buffer (mbuf), bi_from, bi_to, 0);
2816 } 1432 }
2817 1433
2818 /* Relocate all markers pointing into the new, larger gap to 1434 /* Relocate all markers pointing into the new, larger gap to
2819 point at the end of the text before the gap. */ 1435 point at the end of the text before the gap. */
2820 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1436 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2826 } 1442 }
2827 1443
2828 /* Relocate any extent endpoints just like markers. */ 1444 /* Relocate any extent endpoints just like markers. */
2829 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1445 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2830 { 1446 {
2831 adjust_extents_for_deletion (make_buffer (mbuf), bi_from, bi_to, 1447 adjust_extents_for_deletion (wrap_buffer (mbuf), bi_from, bi_to,
2832 BUF_GAP_SIZE (mbuf), 1448 BUF_GAP_SIZE (mbuf),
2833 bc_numdel, BUF_GAP_SIZE (mbuf)); 1449 bc_numdel, BUF_GAP_SIZE (mbuf));
2834 } 1450 }
2835 1451
2836 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1452 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2862 #endif 1478 #endif
2863 1479
2864 #ifdef ERROR_CHECK_EXTENTS 1480 #ifdef ERROR_CHECK_EXTENTS
2865 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons) 1481 MAP_INDIRECT_BUFFERS (buf, mbuf, bufcons)
2866 { 1482 {
2867 sledgehammer_extent_check (make_buffer (mbuf)); 1483 sledgehammer_extent_check (wrap_buffer (mbuf));
2868 } 1484 }
2869 #endif 1485 #endif
2870 1486
2871 signal_after_change (buf, from, to, from); 1487 signal_after_change (buf, from, to, from);
2872 } 1488 }
3025 memcpy (dest, start1, len1); 1641 memcpy (dest, start1, len1);
3026 memcpy (dest + len1, start2, bi_len - len1); 1642 memcpy (dest + len1, start2, bi_len - len1);
3027 } 1643 }
3028 } 1644 }
3029 1645
1646 init_string_ascii_begin (val);
1647 sledgehammer_check_ascii_begin (val);
1648
3030 UNGCPRO; 1649 UNGCPRO;
3031 return val; 1650 return val;
3032 } 1651 }
3033 1652
3034 Lisp_Object 1653 Lisp_Object
3070 charbpos_to_bytebpos (buf, to), 1689 charbpos_to_bytebpos (buf, to),
3071 iro); 1690 iro);
3072 } 1691 }
3073 } 1692 }
3074 1693
3075 void
3076 find_charsets_in_intbyte_string (unsigned char *charsets, const Intbyte *str,
3077 Bytecount len)
3078 {
3079 #ifndef MULE
3080 /* Telescope this. */
3081 charsets[0] = 1;
3082 #else
3083 const Intbyte *strend = str + len;
3084 memset (charsets, 0, NUM_LEADING_BYTES);
3085
3086 /* #### SJT doesn't like this. */
3087 if (len == 0)
3088 {
3089 charsets[XCHARSET_LEADING_BYTE (Vcharset_ascii) - 128] = 1;
3090 return;
3091 }
3092
3093 while (str < strend)
3094 {
3095 charsets[CHAR_LEADING_BYTE (charptr_emchar (str)) - 128] = 1;
3096 INC_CHARPTR (str);
3097 }
3098 #endif
3099 }
3100
3101 void
3102 find_charsets_in_emchar_string (unsigned char *charsets, const Emchar *str,
3103 Charcount len)
3104 {
3105 #ifndef MULE
3106 /* Telescope this. */
3107 charsets[0] = 1;
3108 #else
3109 int i;
3110
3111 memset (charsets, 0, NUM_LEADING_BYTES);
3112
3113 /* #### SJT doesn't like this. */
3114 if (len == 0)
3115 {
3116 charsets[XCHARSET_LEADING_BYTE (Vcharset_ascii) - 128] = 1;
3117 return;
3118 }
3119
3120 for (i = 0; i < len; i++)
3121 {
3122 charsets[CHAR_LEADING_BYTE (str[i]) - 128] = 1;
3123 }
3124 #endif
3125 }
3126
3127 int
3128 intbyte_string_displayed_columns (const Intbyte *str, Bytecount len)
3129 {
3130 int cols = 0;
3131 const Intbyte *end = str + len;
3132
3133 while (str < end)
3134 {
3135 #ifdef MULE
3136 Emchar ch = charptr_emchar (str);
3137 cols += XCHARSET_COLUMNS (CHAR_CHARSET (ch));
3138 #else
3139 cols++;
3140 #endif
3141 INC_CHARPTR (str);
3142 }
3143
3144 return cols;
3145 }
3146
3147 int
3148 emchar_string_displayed_columns (const Emchar *str, Charcount len)
3149 {
3150 #ifdef MULE
3151 int cols = 0;
3152 int i;
3153
3154 for (i = 0; i < len; i++)
3155 cols += XCHARSET_COLUMNS (CHAR_CHARSET (str[i]));
3156
3157 return cols;
3158 #else /* not MULE */
3159 return len;
3160 #endif
3161 }
3162
3163 /* NOTE: Does not reset the Dynarr. */
3164
3165 void
3166 convert_intbyte_string_into_emchar_dynarr (const Intbyte *str, Bytecount len,
3167 Emchar_dynarr *dyn)
3168 {
3169 const Intbyte *strend = str + len;
3170
3171 while (str < strend)
3172 {
3173 Emchar ch = charptr_emchar (str);
3174 Dynarr_add (dyn, ch);
3175 INC_CHARPTR (str);
3176 }
3177 }
3178
3179 Charcount
3180 convert_intbyte_string_into_emchar_string (const Intbyte *str, Bytecount len,
3181 Emchar *arr)
3182 {
3183 const Intbyte *strend = str + len;
3184 Charcount newlen = 0;
3185 while (str < strend)
3186 {
3187 Emchar ch = charptr_emchar (str);
3188 arr[newlen++] = ch;
3189 INC_CHARPTR (str);
3190 }
3191 return newlen;
3192 }
3193
3194 /* Convert an array of Emchars into the equivalent string representation.
3195 Store into the given Intbyte dynarr. Does not reset the dynarr.
3196 Does not add a terminating zero. */
3197
3198 void
3199 convert_emchar_string_into_intbyte_dynarr (Emchar *arr, int nels,
3200 Intbyte_dynarr *dyn)
3201 {
3202 Intbyte str[MAX_EMCHAR_LEN];
3203 int i;
3204
3205 for (i = 0; i < nels; i++)
3206 {
3207 Bytecount len = set_charptr_emchar (str, arr[i]);
3208 Dynarr_add_many (dyn, str, len);
3209 }
3210 }
3211
3212 /* Convert an array of Emchars into the equivalent string representation.
3213 Malloc the space needed for this and return it. If LEN_OUT is not a
3214 NULL pointer, store into LEN_OUT the number of Intbytes in the
3215 malloc()ed string. Note that the actual number of Intbytes allocated
3216 is one more than this: the returned string is zero-terminated. */
3217
3218 Intbyte *
3219 convert_emchar_string_into_malloced_string (Emchar *arr, int nels,
3220 Bytecount *len_out)
3221 {
3222 /* Damn zero-termination. */
3223 Intbyte *str = (Intbyte *) alloca (nels * MAX_EMCHAR_LEN + 1);
3224 Intbyte *strorig = str;
3225 Bytecount len;
3226
3227 int i;
3228
3229 for (i = 0; i < nels; i++)
3230 str += set_charptr_emchar (str, arr[i]);
3231 *str = '\0';
3232 len = str - strorig;
3233 str = (Intbyte *) xmalloc (1 + len);
3234 memcpy (str, strorig, 1 + len);
3235 if (len_out)
3236 *len_out = len;
3237 return str;
3238 }
3239
3240 1694
3241 /************************************************************************/ 1695 /************************************************************************/
3242 /* initialization */ 1696 /* initialization */
3243 /************************************************************************/ 1697 /************************************************************************/
3244 1698
3245 void 1699 void
3246 reinit_vars_of_insdel (void) 1700 reinit_vars_of_insdel (void)
3247 { 1701 {
3248 int i;
3249
3250 inside_change_hook = 0; 1702 inside_change_hook = 0;
3251 in_first_change = 0; 1703 in_first_change = 0;
3252
3253 for (i = 0; i <= MAX_BYTEBPOS_GAP_SIZE_3; i++)
3254 three_to_one_table[i] = i / 3;
3255 } 1704 }
3256 1705
3257 void 1706 void
3258 vars_of_insdel (void) 1707 vars_of_insdel (void)
3259 { 1708 {
3281 1730
3282 b->text->mule_bufmin = b->text->mule_bufmax = 1; 1731 b->text->mule_bufmin = b->text->mule_bufmax = 1;
3283 b->text->mule_bytmin = b->text->mule_bytmax = 1; 1732 b->text->mule_bytmin = b->text->mule_bytmax = 1;
3284 b->text->mule_shifter = 0; 1733 b->text->mule_shifter = 0;
3285 b->text->mule_three_p = 0; 1734 b->text->mule_three_p = 0;
1735 b->text->entirely_ascii_p = 1;
3286 1736
3287 for (i = 0; i < 16; i++) 1737 for (i = 0; i < 16; i++)
3288 { 1738 {
3289 b->text->mule_charbpos_cache[i] = 1; 1739 b->text->mule_charbpos_cache[i] = 1;
3290 b->text->mule_bytebpos_cache[i] = 1; 1740 b->text->mule_bytebpos_cache[i] = 1;