comparison src/buffer.h @ 2367:ecf1ebac70d8

[xemacs-hg @ 2004-11-04 23:05:23 by ben] commit mega-patch configure.in: Turn off -Winline and -Wchar-subscripts. Use the right set of cflags when compiling modules. Rewrite ldap configuration to separate the inclusion of lber (needed in recent Cygwin) from the basic checks for the needed libraries. add a function for MAKE_JUNK_C; initially code was added to generate xemacs.def using this, but it will need to be rewritten. add an rm -f for junk.c to avoid weird Cygwin bug with cp -f onto an existing file. Sort list of auto-detected functions and eliminate unused checks for stpcpy, setlocale and getwd. Add autodetection of Cygwin scanf problems BETA: Rewrite section on configure to indicate what flags are important and what not. digest-doc.c, make-dump-id.c, profile.c, sorted-doc.c: Add proper decls for main(). make-msgfile.c: Document that this is old junk. Move proposal to text.c. make-msgfile.lex: Move proposal to text.c. make-mswin-unicode.pl: Convert error-generating code so that the entire message will be seen as a single unrecognized token. mule/mule-ccl.el: Update docs. lispref/mule.texi: Update CCL docs. ldap/eldap.c: Mule-ize. Use EXTERNAL_LIST_LOOP_2 instead of deleted EXTERNAL_LIST_LOOP. * XEmacs 21.5.18 "chestnut" is released. --------------------------------------------------------------- MULE-RELATED WORK: --------------------------------------------------------------- --------------------------- byte-char conversion --------------------------- buffer.c, buffer.h, insdel.c, text.c: Port FSF algorithm for byte-char conversion, replacing broken previous version. Track the char position of the gap. Add functions to do char-byte conversion downwards as well as upwards. Move comments about algorithm workings to internals manual. --------------------------- work on types --------------------------- alloc.c, console-x-impl.h, dump-data.c, dump-data.h, dumper.c, dialog-msw.c, dired-msw.c, doc.c, editfns.c, esd.c, event-gtk.h, event-msw.c, events.c, file-coding.c, file-coding.h, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-shared.c, glyphs-x.c, glyphs.c, glyphs.h, gui.c, hpplay.c, imgproc.c, intl-win32.c, lrecord.h, lstream.c, keymap.c, lisp.h, libsst.c, linuxplay.c, miscplay.c, miscplay.h, mule-coding.c, nas.c, nt.c, ntheap.c, ntplay.c, objects-msw.c, objects-tty.c, objects-x.c, print.c, process-nt.c, process.c, redisplay.h, select-common.h, select-gtk.c, select-x.c, sgiplay.c, sound.c, sound.h, sunplay.c, sysfile.h, sysdep.c, syswindows.h, text.c, unexnt.c, win32.c, xgccache.c: Further work on types. This creates a full set of types for all the basic semantics of `char' that I have so far identified, so that its semantics can always be identified for the purposes of proper Mule-safe code, and the raw use of `char' always avoided. (1) More type renaming, for consistency of naming. Char_ASCII -> Ascbyte UChar_ASCII -> UAscbyte Char_Binary -> CBinbyte UChar_Binary -> Binbyte SChar_Binary -> SBinbyte (2) Introduce Rawbyte, CRawbyte, Boolbyte, Chbyte, UChbyte, and Bitbyte and use them. (3) New types Itext, Wexttext and Textcount for separating out the concepts of bytes and textual units (different under UTF-16 and UTF-32, which are potential internal encodings). (4) qxestr*_c -> qxestr*_ascii. lisp.h: New; goes with other qxe() functions. #### Maybe goes in a different section. lisp.h: Group generic int-type defs together with EMACS_INT defs. lisp.h: * lisp.h (WEXTTEXT_IS_WIDE) New defns. lisp.h: New type to replace places where int occurs as a boolean. It's signed because occasionally people may want to use -1 as an error value, and because unsigned ints are viral -- see comments in the internals manual against using them. dynarr.c: int -> Bytecount. --------------------------- Mule-izing --------------------------- device-x.c: Partially Mule-ize. dumper.c, dumper.h: Mule-ize. Use Rawbyte. Use stderr_out not printf. Use wext_*(). sysdep.c, syswindows.h, text.c: New Wexttext API for manipulation of external text that may be Unicode (e.g. startup code under Windows). emacs.c: Mule-ize. Properly deal with argv in external encoding. Use wext_*() and Wexttext. Use Rawbyte. #if 0 some old junk on SCO that is unlikely to be correct. Rewrite allocation code in run-temacs. emacs.c, symsinit.h, win32.c: Rename win32 init function and call it even earlier, to initialize mswindows_9x_p even earlier, for use in startup code (XEUNICODE_P). process.c: Use _wenviron not environ under Windows, to get Unicode environment variables. event-Xt.c: Mule-ize drag-n-drop related stuff. dragdrop.c, dragdrop.h, frame-x.c: Mule-ize. text.h: Add some more stand-in defines for particular kinds of conversion; use in Mule-ization work in frame-x.c etc. --------------------------- Freshening --------------------------- intl-auto-encap-win32.c, intl-auto-encap-win32.h: Regenerate. --------------------------- Unicode-work --------------------------- intl-win32.c, syswindows.h: Factor out common options to MultiByteToWideChar and WideCharToMultiByte. Add convert_unicode_to_multibyte_malloc() and convert_unicode_to_multibyte_dynarr() and use. Add stuff for alloca() conversion of multibyte/unicode. alloc.c: Use dfc_external_data_len() in case of unicode coding system. alloc.c, mule-charset.c: Don't zero out and reinit charset Unicode tables. This fucks up dump-time loading. Anyway, either we load them at dump time or run time, never both. unicode.c: Dump the blank tables as well. --------------------------------------------------------------- DOCUMENTATION, MOSTLY MULE-RELATED: --------------------------------------------------------------- EmacsFrame.c, emodules.c, event-Xt.c, fileio.c, input-method-xlib.c, mule-wnnfns.c, redisplay-gtk.c, redisplay-tty.c, redisplay-x.c, regex.c, sysdep.c: Add comment about Mule work needed. text.h: Add more documentation describing why DFC routines were not written to return their value. Add some other DFC documentation. console-msw.c, console-msw.h: Add pointer to docs in win32.c. emacs.c: Add comments on sources of doc info. text.c, charset.h, unicode.c, intl-win32.c, intl-encap-win32.c, text.h, file-coding.c, mule-coding.c: Collect background comments and related to text matters and internationalization, and proposals for work to be done, in text.c or Internals manual, stuff related to specific textual API's in text.h, and stuff related to internal implementation of Unicode conversion in unicode.c. Put lots of pointers to the comments to make them easier to find. s/mingw32.h, s/win32-common.h, s/win32-native.h, s/windowsnt.h, win32.c: Add bunches of new documentation on the different kinds of builds and environments under Windows and how they work. Collect this info in win32.c. Add pointers to these docs in the relevant s/* files. emacs.c: Document places with long comments. Remove comment about exiting, move to internals manual, put in pointer. event-stream.c: Move docs about event queues and focus to internals manual, put in pointer. events.h: Move docs about event stream callbacks to internals manual, put in pointer. profile.c, redisplay.c, signal.c: Move documentation to the Internals manual. process-nt.c: Add pointer to comment in win32-native.el. lisp.h: Add comments about some comment conventions. lisp.h: Add comment about the second argument. device-msw.c, redisplay-msw.c: @@#### comments are out-of-date. --------------------------------------------------------------- PDUMP WORK (MOTIVATED BY UNICODE CHANGES) --------------------------------------------------------------- alloc.c, buffer.c, bytecode.c, console-impl.h, console.c, device.c, dumper.c, lrecord.h, elhash.c, emodules.h, events.c, extents.c, frame.c, glyphs.c, glyphs.h, mule-charset.c, mule-coding.c, objects.c, profile.c, rangetab.c, redisplay.c, specifier.c, specifier.h, window.c, lstream.c, file-coding.h, file-coding.c: PDUMP: Properly implement dump_add_root_block(), which never worked before, and is necessary for dumping Unicode tables. Pdump name changes for accuracy: XD_STRUCT_PTR -> XD_BLOCK_PTR. XD_STRUCT_ARRAY -> XD_BLOCK_ARRAY. XD_C_STRING -> XD_ASCII_STRING. *_structure_* -> *_block_*. lrecord.h: some comments added about dump_add_root_block() vs dump_add_root_block_ptr(). extents.c: remove incorrect comment about pdump problems with gap array. --------------------------------------------------------------- ALLOCATION --------------------------------------------------------------- abbrev.c, alloc.c, bytecode.c, casefiddle.c, device-msw.c, device-x.c, dired-msw.c, doc.c, doprnt.c, dragdrop.c, editfns.c, emodules.c, file-coding.c, fileio.c, filelock.c, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-x.c, gui-msw.c, gui-x.c, imgproc.c, intl-win32.c, lread.c, menubar-gtk.c, menubar.c, nt.c, objects-msw.c, objects-x.c, print.c, process-nt.c, process-unix.c, process.c, realpath.c, redisplay.c, search.c, select-common.c, symbols.c, sysdep.c, syswindows.h, text.c, text.h, ui-byhand.c: New macros {alloca,xnew}_{itext,{i,ext,raw,bin,asc}bytes} for more convenient allocation of these commonly requested items. Modify functions to use alloca_ibytes, alloca_array, alloca_extbytes, xnew_ibytes, etc. also XREALLOC_ARRAY, xnew. alloc.c: Rewrite the allocation functions to factor out repeated code. Add assertions for freeing dumped data. lisp.h: Moved down and consolidated with other allocation stuff. lisp.h, dynarr.c: New functions for allocation that's very efficient when mostly in LIFO order. lisp.h, text.c, text.h: Factor out some stuff for general use by alloca()-conversion funs. text.h, lisp.h: Fill out convenience routines for allocating various kinds of bytes and put them in lisp.h. Use them in place of xmalloc(), ALLOCA(). text.h: Fill out the convenience functions so the _MALLOC() kinds match the alloca() kinds. --------------------------------------------------------------- ERROR-CHECKING --------------------------------------------------------------- text.h: Create ASSERT_ASCTEXT_ASCII() and ASSERT_ASCTEXT_ASCII_LEN() from similar Eistring checkers and change the Eistring checkers to use them instead. --------------------------------------------------------------- MACROS IN LISP.H --------------------------------------------------------------- lisp.h: Redo GCPRO declarations. Create a "base" set of functions that can be used to generate any kind of gcpro sets -- regular, ngcpro, nngcpro, private ones used in GC_EXTERNAL_LIST_LOOP_2. buffer.c, callint.c, chartab.c, console-msw.c, device-x.c, dialog-msw.c, dired.c, extents.c, ui-gtk.c, rangetab.c, nt.c, mule-coding.c, minibuf.c, menubar-msw.c, menubar.c, menubar-gtk.c, lread.c, lisp.h, gutter.c, glyphs.c, glyphs-widget.c, fns.c, fileio.c, file-coding.c, specifier.c: Eliminate EXTERNAL_LIST_LOOP, which does not check for circularities. Use EXTERNAL_LIST_LOOP_2 instead or EXTERNAL_LIST_LOOP_3 or EXTERNAL_PROPERTY_LIST_LOOP_3 or GC_EXTERNAL_LIST_LOOP_2 (new macro). Removed/redid comments on EXTERNAL_LIST_LOOP. --------------------------------------------------------------- SPACING FIXES --------------------------------------------------------------- callint.c, hftctl.c, number-gmp.c, process-unix.c: Spacing fixes. --------------------------------------------------------------- FIX FOR GEOMETRY PROBLEM IN FIRST FRAME --------------------------------------------------------------- unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by release 1.5.12 of Cygwin]. toolbar.c: bug fix for problem of initial frame being 77 chars wide on Windows. will be overridden by my other ws. --------------------------------------------------------------- FIX FOR LEAKING PROCESS HANDLES: --------------------------------------------------------------- process-nt.c: Fixes for leaking handles. Inspired by work done by Adrian Aichner <adrian@xemacs.org>. --------------------------------------------------------------- FIX FOR CYGWIN BUG (Unicode-related): --------------------------------------------------------------- unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by release 1.5.12 of Cygwin]. --------------------------------------------------------------- WARNING FIXES: --------------------------------------------------------------- console-stream.c: `reinit' is unused. compiler.h, event-msw.c, frame-msw.c, intl-encap-win32.c, text.h: Add stuff to deal with ANSI-aliasing warnings I got. regex.c: Gather includes together to avoid warning. --------------------------------------------------------------- CHANGES TO INITIALIZATION ROUTINES: --------------------------------------------------------------- buffer.c, emacs.c, console.c, debug.c, device-x.c, device.c, dragdrop.c, emodules.c, eval.c, event-Xt.c, event-gtk.c, event-msw.c, event-stream.c, event-tty.c, events.c, extents.c, faces.c, file-coding.c, fileio.c, font-lock.c, frame-msw.c, glyphs-widget.c, glyphs.c, gui-x.c, insdel.c, lread.c, lstream.c, menubar-gtk.c, menubar-x.c, minibuf.c, mule-wnnfns.c, objects-msw.c, objects.c, print.c, scrollbar-x.c, search.c, select-x.c, text.c, undo.c, unicode.c, window.c, symsinit.h: Call reinit_*() functions directly from emacs.c, for clarity. Factor out some redundant init code. Move disallowed stuff that had crept into vars_of_glyphs() into complex_vars_of_glyphs(). Call init_eval_semi_early() from eval.c not in the middle of vars_of_() in emacs.c since there should be no order dependency in the latter calls. --------------------------------------------------------------- ARMAGEDDON: --------------------------------------------------------------- alloc.c, emacs.c, lisp.h, print.c: Rename inhibit_non_essential_printing_operations to inhibit_non_essential_conversion_operations. text.c: Assert on !inhibit_non_essential_conversion_operations. console-msw.c, print.c: Don't do conversion in SetConsoleTitle or FindWindow to avoid problems during armageddon. Put #errors for NON_ASCII_INTERNAL_FORMAT in places where problems would arise. --------------------------------------------------------------- CHANGES TO THE BUILD PROCEDURE: --------------------------------------------------------------- config.h.in, s/cxux.h, s/usg5-4-2.h, m/powerpc.h: Add comment about correct ordering of this file. Rearrange everything to follow this -- put all #undefs together and before the s&m files. Add undefs for HAVE_ALLOCA, C_ALLOCA, BROKEN_ALLOCA_IN_FUNCTION_CALLS, STACK_DIRECTION. Remove unused HAVE_STPCPY, HAVE_GETWD, HAVE_SETLOCALE. m/gec63.h: Deleted; totally broken, not used at all, not in FSF. m/7300.h, m/acorn.h, m/alliant-2800.h, m/alliant.h, m/altos.h, m/amdahl.h, m/apollo.h, m/att3b.h, m/aviion.h, m/celerity.h, m/clipper.h, m/cnvrgnt.h, m/convex.h, m/cydra5.h, m/delta.h, m/delta88k.h, m/dpx2.h, m/elxsi.h, m/ews4800r.h, m/gould.h, m/hp300bsd.h, m/hp800.h, m/hp9000s300.h, m/i860.h, m/ibmps2-aix.h, m/ibmrs6000.h, m/ibmrt-aix.h, m/ibmrt.h, m/intel386.h, m/iris4d.h, m/iris5d.h, m/iris6d.h, m/irist.h, m/isi-ov.h, m/luna88k.h, m/m68k.h, m/masscomp.h, m/mg1.h, m/mips-nec.h, m/mips-siemens.h, m/mips.h, m/news.h, m/nh3000.h, m/nh4000.h, m/ns32000.h, m/orion105.h, m/pfa50.h, m/plexus.h, m/pmax.h, m/powerpc.h, m/pyrmips.h, m/sequent-ptx.h, m/sequent.h, m/sgi-challenge.h, m/symmetry.h, m/tad68k.h, m/tahoe.h, m/targon31.h, m/tekxd88.h, m/template.h, m/tower32.h, m/tower32v3.h, m/ustation.h, m/vax.h, m/wicat.h, m/xps100.h: Delete C_ALLOCA, HAVE_ALLOCA, STACK_DIRECTION, BROKEN_ALLOCA_IN_FUNCTION_CALLS. All of this is auto-detected. When in doubt, I followed recent FSF sources, which also have these things deleted.
author ben
date Thu, 04 Nov 2004 23:08:28 +0000
parents ba4677f54a05
children 6fa9919a9a0b
comparison
equal deleted inserted replaced
2366:2a392e0c390a 2367:ecf1ebac70d8
1 /* Header file for the buffer manipulation primitives. 1 /* Header file for the buffer manipulation primitives.
2 Copyright (C) 1985, 1986, 1992, 1993, 1994, 1995 2 Copyright (C) 1985, 1986, 1992, 1993, 1994, 1995
3 Free Software Foundation, Inc. 3 Free Software Foundation, Inc.
4 Copyright (C) 1995 Sun Microsystems, Inc. 4 Copyright (C) 1995 Sun Microsystems, Inc.
5 Copyright (C) 2001, 2002 Ben Wing. 5 Copyright (C) 2001, 2002, 2004 Ben Wing.
6 6
7 This file is part of XEmacs. 7 This file is part of XEmacs.
8 8
9 XEmacs is free software; you can redistribute it and/or modify it 9 XEmacs is free software; you can redistribute it and/or modify it
10 under the terms of the GNU General Public License as published by the 10 under the terms of the GNU General Public License as published by the
72 children pointers. This is not terribly hard, though, and the 72 children pointers. This is not terribly hard, though, and the
73 code to maintain this is just like the code already present in 73 code to maintain this is just like the code already present in
74 extent-parent and extent-children. 74 extent-parent and extent-children.
75 */ 75 */
76 76
77 #define NUM_CACHED_POSITIONS 50
78 #define NUM_MOVED_POSITIONS 10
79
77 struct buffer_text 80 struct buffer_text
78 { 81 {
79 Ibyte *beg; /* Actual address of buffer contents. */ 82 Ibyte *beg; /* Actual address of buffer contents. */
80 Bytebpos gpt; /* Index of gap in buffer. */ 83 Bytebpos gpt; /* Index of gap in buffer. */
84 Charbpos bufgpt; /* Equivalent as a Charbpos. */
81 Bytebpos z; /* Index of end of buffer. */ 85 Bytebpos z; /* Index of end of buffer. */
82 Charbpos bufz; /* Equivalent as a Charbpos. */ 86 Charbpos bufz; /* Equivalent as a Charbpos. */
83 Bytecount gap_size;/* Size of buffer's gap */ 87 Bytecount gap_size;/* Size of buffer's gap */
84 Bytecount end_gap_size;/* Size of buffer's end gap */ 88 Bytecount end_gap_size;/* Size of buffer's end gap */
85 long modiff; /* This counts buffer-modification events 89 long modiff; /* This counts buffer-modification events
86 for this buffer. It is incremented for 90 for this buffer. It is incremented for
87 each such event, and never otherwise 91 each such event, and never otherwise
88 changed. */ 92 changed. */
89 long save_modiff; /* Previous value of modiff, as of last 93 long save_modiff; /* Previous value of modiff, as of last
90 time buffer visited or saved a file. */ 94 time buffer visited or saved a file. */
91 95
92 #ifdef MULE 96 #ifdef MULE
97
98 #ifdef OLD_BYTE_CHAR
93 /* We keep track of a "known" region for very fast access. This 99 /* We keep track of a "known" region for very fast access. This
94 information is text-only so it goes here. We update this at each 100 information is text-only so it goes here. We update this at each
95 change to the buffer, so if it's entirely ASCII, these will always 101 change to the buffer, so if it's entirely ASCII, these will always
96 contain the minimum and maximum positions of the buffer. */ 102 contain the minimum and maximum positions of the buffer. */
97 Charbpos mule_bufmin, mule_bufmax; 103 Charbpos mule_bufmin, mule_bufmax;
98 Bytebpos mule_bytmin, mule_bytmax; 104 Bytebpos mule_bytmin, mule_bytmax;
99 int mule_shifter, mule_three_p; 105 int mule_shifter, mule_three_p;
100 106 #endif
101 /* And we also cache 16 positions for fairly fast access near those 107
102 positions. */ 108 /* And we also cache NUM_CACHED_POSITIONS positions for fairly fast
103 Charbpos mule_charbpos_cache[16]; 109 access near those positions. */
104 Bytebpos mule_bytebpos_cache[16]; 110 Charbpos mule_charbpos_cache[NUM_CACHED_POSITIONS];
111 Bytebpos mule_bytebpos_cache[NUM_CACHED_POSITIONS];
112 int next_cache_pos;
113
114 Charbpos cached_charpos;
115 Bytebpos cached_bytepos;
105 116
106 /* True if all chars fit into one byte; 117 /* True if all chars fit into one byte;
107 == (format == FORMAT_8_BIT_FIXED || 118 == (format == FORMAT_8_BIT_FIXED ||
108 (format == FORMAT_DEFAULT && num_ascii_chars == bufz - 1)) 119 (format == FORMAT_DEFAULT && num_ascii_chars == bufz - 1))
109 kept around to speed up (slightly?) the byte-char conversion routines. */ 120 kept around to speed up (slightly?) the byte-char conversion routines. */
115 /* Number of chars in buffer that would fit in an 16-bit-fixed buffer. */ 126 /* Number of chars in buffer that would fit in an 16-bit-fixed buffer. */
116 Charcount num_16_bit_fixed_chars; 127 Charcount num_16_bit_fixed_chars;
117 128
118 /* Currently we only handle 8 bit fixed and default */ 129 /* Currently we only handle 8 bit fixed and default */
119 Internal_Format format; 130 Internal_Format format;
120 #endif 131 #endif /* MULE */
121 132
122 /* Similar to the above, we keep track of positions for which line 133 /* Similar to the above, we keep track of positions for which line
123 number has last been calculated. See line-number.c. */ 134 number has last been calculated. See line-number.c. */
124 Lisp_Object line_number_cache; 135 Lisp_Object line_number_cache;
125 136
309 #define BUF_ZV(buf) ((buf)->bufzv + 0) 320 #define BUF_ZV(buf) ((buf)->bufzv + 0)
310 321
311 /* End of buffer. */ 322 /* End of buffer. */
312 #define BYTE_BUF_Z(buf) ((buf)->text->z + 0) 323 #define BYTE_BUF_Z(buf) ((buf)->text->z + 0)
313 #define BUF_Z(buf) ((buf)->text->bufz + 0) 324 #define BUF_Z(buf) ((buf)->text->bufz + 0)
325
326 /* Gap location. */
327 #define BYTE_BUF_GPT(buf) ((buf)->text->gpt + 0)
328 #define BUF_GPT(buf) ((buf)->text->bufgpt + 0)
314 329
315 /* Point. */ 330 /* Point. */
316 #define BYTE_BUF_PT(buf) ((buf)->pt + 0) 331 #define BYTE_BUF_PT(buf) ((buf)->pt + 0)
317 #define BUF_PT(buf) ((buf)->bufpt + 0) 332 #define BUF_PT(buf) ((buf)->bufpt + 0)
318 333
534 549
535 /*----------------------------------------------------------------------*/ 550 /*----------------------------------------------------------------------*/
536 /* Converting between byte and character positions */ 551 /* Converting between byte and character positions */
537 /*----------------------------------------------------------------------*/ 552 /*----------------------------------------------------------------------*/
538 553
554 /*
555
556 Info on Byte-Char conversion:
557
558 (Info-goto-node "(internals)Byte-Char Position Conversion")
559 */
560
539 #ifdef MULE 561 #ifdef MULE
540
541 /* The basic algorithm we use is to keep track of a known region of
542 characters in each buffer, all of which are of the same width. We keep
543 track of the boundaries of the region in both Charbpos and Bytebpos
544 coordinates and also keep track of the char width, which is 1 - 4 bytes.
545 If the position we're translating is not in the known region, then we
546 invoke a function to update the known region to surround the position in
547 question. This assumes locality of reference, which is usually the
548 case.
549
550 Note that the function to update the known region can be simple or
551 complicated depending on how much information we cache. In addition to
552 the known region, we always cache the correct conversions for point,
553 BEGV, and ZV, and in addition to this we cache 16 positions where the
554 conversion is known. We only look in the cache or update it when we
555 need to move the known region more than a certain amount (currently 50
556 chars), and then we throw away a "random" value and replace it with the
557 newly calculated value.
558
559 Finally, we maintain an extra flag that tracks whether the buffer is
560 entirely ASCII, to speed up the conversions even more. This flag is
561 actually of dubious value because in an entirely-ASCII buffer the known
562 region will always span the entire buffer (in fact, we update the flag
563 based on this fact), and so all we're saving is a few machine cycles.
564
565 A potentially smarter method than what we do with known regions and
566 cached positions would be to keep some sort of pseudo-extent layer over
567 the buffer; maybe keep track of the charbpos/bytebpos correspondence at the
568 beginning of each line, which would allow us to do a binary search over
569 the pseudo-extents to narrow things down to the correct line, at which
570 point you could use a linear movement method. This would also mesh well
571 with efficiently implementing a line-numbering scheme. However, you
572 have to weigh the amount of time spent updating the cache vs. the
573 savings that result from it. In reality, we modify the buffer far less
574 often than we access it, so a cache of this sort that provides
575 guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a
576 maximum on the cache size) would indeed be a win, particularly in very
577 large buffers. If we ever implement this, we should probably set a
578 reasonably high minimum below which we use the old method, because the
579 time spent updating the fancy cache would likely become dominant when
580 making buffer modifications in smaller buffers.
581
582 Note also that we have to multiply or divide by the char width in order
583 to convert the positions. We do some tricks to avoid ever actually
584 having to do a multiply or divide, because that is typically an
585 expensive operation (esp. divide). Multiplying or dividing by 1, 2, or
586 4 can be implemented simply as a shift left or shift right, and we keep
587 track of a shifter value (0, 1, or 2) indicating how much to shift.
588 Multiplying by 3 can be implemented by doubling and then adding the
589 original value. Dividing by 3, alas, cannot be implemented in any
590 simple shift/subtract method, as far as I know; so we just do a table
591 lookup. For simplicity, we use a table of size 128K, which indexes the
592 "divide-by-3" values for the first 64K non-negative numbers. (Note that
593 we can increase the size up to 384K, i.e. indexing the first 192K
594 non-negative numbers, while still using shorts in the array.) This also
595 means that the size of the known region can be at most 64K for
596 width-three characters.
597
598 !!#### We should investigate the algorithm in GNU Emacs. I think it
599 does something similar, but it may differ in some details, and it's
600 worth seeing if anything can be gleaned.
601 */
602 562
603 Bytebpos charbpos_to_bytebpos_func (struct buffer *buf, Charbpos x); 563 Bytebpos charbpos_to_bytebpos_func (struct buffer *buf, Charbpos x);
604 Charbpos bytebpos_to_charbpos_func (struct buffer *buf, Bytebpos x); 564 Charbpos bytebpos_to_charbpos_func (struct buffer *buf, Bytebpos x);
605 extern short three_to_one_table[]; 565 extern short three_to_one_table[];
606 566
621 retval = (Bytebpos) x; 581 retval = (Bytebpos) x;
622 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) 582 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED)
623 retval = (Bytebpos) (x << 1); 583 retval = (Bytebpos) (x << 1);
624 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) 584 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED)
625 retval = (Bytebpos) (x << 2); 585 retval = (Bytebpos) (x << 2);
586 #ifdef OLD_BYTE_CHAR
626 else if (x >= buf->text->mule_bufmin && x <= buf->text->mule_bufmax) 587 else if (x >= buf->text->mule_bufmin && x <= buf->text->mule_bufmax)
627 retval = (buf->text->mule_bytmin + 588 retval = (buf->text->mule_bytmin +
628 ((x - buf->text->mule_bufmin) << buf->text->mule_shifter) + 589 ((x - buf->text->mule_bufmin) << buf->text->mule_shifter) +
629 (buf->text->mule_three_p ? (x - buf->text->mule_bufmin) : 590 (buf->text->mule_three_p ? (x - buf->text->mule_bufmin) :
630 (Bytebpos) 0)); 591 (Bytebpos) 0));
592 #endif /* OLD_BYTE_CHAR */
631 else 593 else
632 retval = charbpos_to_bytebpos_func (buf, x); 594 retval = charbpos_to_bytebpos_func (buf, x);
633 #else 595 #else
634 retval = (Bytebpos) x; 596 retval = (Bytebpos) x;
635 #endif 597 #endif
652 retval = (Charbpos) x; 614 retval = (Charbpos) x;
653 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) 615 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED)
654 retval = (Charbpos) (x >> 1); 616 retval = (Charbpos) (x >> 1);
655 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) 617 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED)
656 retval = (Charbpos) (x >> 2); 618 retval = (Charbpos) (x >> 2);
619 #ifdef OLD_BYTE_CHAR
657 else if (x >= buf->text->mule_bytmin && x <= buf->text->mule_bytmax) 620 else if (x >= buf->text->mule_bytmin && x <= buf->text->mule_bytmax)
658 retval = (buf->text->mule_bufmin + 621 retval = (buf->text->mule_bufmin +
659 ((buf->text->mule_three_p 622 ((buf->text->mule_three_p
660 ? three_to_one_table[x - buf->text->mule_bytmin] 623 ? three_to_one_table[x - buf->text->mule_bytmin]
661 : (x - buf->text->mule_bytmin) >> buf->text->mule_shifter))); 624 : (x - buf->text->mule_bytmin) >> buf->text->mule_shifter)));
625 #endif /* OLD_BYTE_CHAR */
662 else 626 else
663 retval = bytebpos_to_charbpos_func (buf, x); 627 retval = bytebpos_to_charbpos_func (buf, x);
664 #else 628 #else
665 retval = (Charbpos) x; 629 retval = (Charbpos) x;
666 #endif 630 #endif
1069 Currently there will be at most two iterations in the loop, but it is 1033 Currently there will be at most two iterations in the loop, but it is
1070 written in such a way that it will still work if the buffer 1034 written in such a way that it will still work if the buffer
1071 representation is changed to have multiple gaps in it. 1035 representation is changed to have multiple gaps in it.
1072 */ 1036 */
1073 1037
1074
1075 /* Return the maximum position in the buffer it is safe to scan forwards 1038 /* Return the maximum position in the buffer it is safe to scan forwards
1076 past N to. This is used to prevent buffer scans from running into 1039 past N to. This is used to prevent buffer scans from running into
1077 the gap (e.g. search.c). All characters between N and CEILING_OF(N) 1040 the gap (e.g. search.c). All characters between N and CEILING_OF(N)
1078 are located contiguous in memory. Note that the character *at* 1041 are located contiguous in memory. Note that the character *at*
1079 CEILING_OF(N) is not contiguous in memory. */ 1042 CEILING_OF(N) is not contiguous in memory. */
1080 #define BYTE_BUF_CEILING_OF(b, n) \ 1043 #define BYTE_BUF_CEILING_OF(b, n) \
1081 ((n) < (b)->text->gpt && (b)->text->gpt < BYTE_BUF_ZV (b) ? \ 1044 ((n) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < BYTE_BUF_ZV (b) ? \
1082 (b)->text->gpt : BYTE_BUF_ZV (b)) 1045 BYTE_BUF_GPT (b) : BYTE_BUF_ZV (b))
1083 #define BUF_CEILING_OF(b, n) \ 1046 #define BUF_CEILING_OF(b, n) \
1084 bytebpos_to_charbpos (b, BYTE_BUF_CEILING_OF (b, charbpos_to_bytebpos (b, n))) 1047 ((n) < BUF_GPT (b) && BUF_GPT (b) < BUF_ZV (b) ? \
1048 BUF_GPT (b) : BUF_ZV (b))
1085 1049
1086 /* Return the minimum position in the buffer it is safe to scan backwards 1050 /* Return the minimum position in the buffer it is safe to scan backwards
1087 past N to. All characters between FLOOR_OF(N) and N are located 1051 past N to. All characters between FLOOR_OF(N) and N are located
1088 contiguous in memory. Note that the character *at* N may not be 1052 contiguous in memory. Note that the character *at* N may not be
1089 contiguous in memory. */ 1053 contiguous in memory. */
1090 #define BYTE_BUF_FLOOR_OF(b, n) \ 1054 #define BYTE_BUF_FLOOR_OF(b, n) \
1091 (BYTE_BUF_BEGV (b) < (b)->text->gpt && (b)->text->gpt < (n) ? \ 1055 (BYTE_BUF_BEGV (b) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < (n) ? \
1092 (b)->text->gpt : BYTE_BUF_BEGV (b)) 1056 BYTE_BUF_GPT (b) : BYTE_BUF_BEGV (b))
1093 #define BUF_FLOOR_OF(b, n) \ 1057 #define BUF_FLOOR_OF(b, n) \
1094 bytebpos_to_charbpos (b, BYTE_BUF_FLOOR_OF (b, charbpos_to_bytebpos (b, n))) 1058 (BUF_BEGV (b) < BUF_GPT (b) && BUF_GPT (b) < (n) ? \
1059 BUF_GPT (b) : BUF_BEGV (b))
1095 1060
1096 #define BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ 1061 #define BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \
1097 ((n) < (b)->text->gpt && (b)->text->gpt < BYTE_BUF_Z (b) ? \ 1062 ((n) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < BYTE_BUF_Z (b) ? \
1098 (b)->text->gpt : BYTE_BUF_Z (b)) 1063 BYTE_BUF_GPT (b) : BYTE_BUF_Z (b))
1099 #define BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ 1064 #define BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \
1100 bytebpos_to_charbpos \ 1065 ((n) < BUF_GPT (b) && BUF_GPT (b) < BUF_Z (b) ? \
1101 (b, BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (b, charbpos_to_bytebpos (b, n))) 1066 BUF_GPT (b) : BUF_Z (b))
1102 1067
1103 #define BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ 1068 #define BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \
1104 (BYTE_BUF_BEG (b) < (b)->text->gpt && (b)->text->gpt < (n) ? \ 1069 (BYTE_BUF_BEG (b) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < (n) ? \
1105 (b)->text->gpt : BYTE_BUF_BEG (b)) 1070 BYTE_BUF_GPT (b) : BYTE_BUF_BEG (b))
1106 #define BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ 1071 #define BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \
1107 bytebpos_to_charbpos \ 1072 (BUF_BEG (b) < BUF_GPT (b) && BUF_GPT (b) < (n) ? \
1108 (b, BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE (b, charbpos_to_bytebpos (b, n))) 1073 BUF_GPT (b) : BUF_BEG (b))
1109 1074
1110 /* Iterate over contiguous chunks of text in buffer BUF, starting at POS, 1075 /* Iterate over contiguous chunks of text in buffer BUF, starting at POS,
1111 of length LEN. Evaluates POS and LEN only once, but BUF multiply. In 1076 of length LEN. Evaluates POS and LEN only once, but BUF multiply. In
1112 each iteration, store the current chunk into RUNPTR/RUNLEN, which will 1077 each iteration, store the current chunk into RUNPTR/RUNLEN, which will
1113 be automatically declared (don't declare them yourself). This does not 1078 be automatically declared (don't declare them yourself). This does not
1115 limits respected, you need to impose them yourself. 1080 limits respected, you need to impose them yourself.
1116 1081
1117 NOTE: This must be surrounded with braces! */ 1082 NOTE: This must be surrounded with braces! */
1118 1083
1119 #define BUFFER_TEXT_LOOP(buf, pos, len, runptr, runlen) \ 1084 #define BUFFER_TEXT_LOOP(buf, pos, len, runptr, runlen) \
1120 Ibyte *runptr; \ 1085 Ibyte *runptr; \
1121 Bytecount runlen; \ 1086 Bytecount runlen; \
1122 Bytebpos BTL_pos = (pos); \ 1087 Bytebpos BTL_pos = (pos); \
1123 Bytebpos BTL_len = (len); \ 1088 Bytebpos BTL_len = (len); \
1124 for (runptr = BYTE_BUF_BYTE_ADDRESS (buf, BTL_pos), \ 1089 for (runptr = BYTE_BUF_BYTE_ADDRESS (buf, BTL_pos), \
1125 runlen = BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (buf, BTL_pos) - BTL_pos, \ 1090 runlen = BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (buf, BTL_pos) - BTL_pos, \