Mercurial > hg > xemacs-beta
comparison src/buffer.h @ 2367:ecf1ebac70d8
[xemacs-hg @ 2004-11-04 23:05:23 by ben]
commit mega-patch
configure.in: Turn off -Winline and -Wchar-subscripts.
Use the right set of cflags when compiling modules.
Rewrite ldap configuration to separate the inclusion of lber
(needed in recent Cygwin) from the basic checks for the
needed libraries.
add a function for MAKE_JUNK_C; initially code was added to
generate xemacs.def using this, but it will need to be rewritten.
add an rm -f for junk.c to avoid weird Cygwin bug with cp -f onto
an existing file.
Sort list of auto-detected functions and eliminate unused checks for
stpcpy, setlocale and getwd.
Add autodetection of Cygwin scanf problems
BETA: Rewrite section on configure to indicate what flags are important
and what not.
digest-doc.c, make-dump-id.c, profile.c, sorted-doc.c: Add proper decls for main().
make-msgfile.c: Document that this is old junk.
Move proposal to text.c.
make-msgfile.lex: Move proposal to text.c.
make-mswin-unicode.pl: Convert error-generating code so that the entire message will
be seen as a single unrecognized token.
mule/mule-ccl.el: Update docs.
lispref/mule.texi: Update CCL docs.
ldap/eldap.c: Mule-ize.
Use EXTERNAL_LIST_LOOP_2 instead of deleted EXTERNAL_LIST_LOOP.
* XEmacs 21.5.18 "chestnut" is released.
---------------------------------------------------------------
MULE-RELATED WORK:
---------------------------------------------------------------
---------------------------
byte-char conversion
---------------------------
buffer.c, buffer.h, insdel.c, text.c: Port FSF algorithm for byte-char conversion, replacing broken
previous version. Track the char position of the gap. Add
functions to do char-byte conversion downwards as well as upwards.
Move comments about algorithm workings to internals manual.
---------------------------
work on types
---------------------------
alloc.c, console-x-impl.h, dump-data.c, dump-data.h, dumper.c, dialog-msw.c, dired-msw.c, doc.c, editfns.c, esd.c, event-gtk.h, event-msw.c, events.c, file-coding.c, file-coding.h, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-shared.c, glyphs-x.c, glyphs.c, glyphs.h, gui.c, hpplay.c, imgproc.c, intl-win32.c, lrecord.h, lstream.c, keymap.c, lisp.h, libsst.c, linuxplay.c, miscplay.c, miscplay.h, mule-coding.c, nas.c, nt.c, ntheap.c, ntplay.c, objects-msw.c, objects-tty.c, objects-x.c, print.c, process-nt.c, process.c, redisplay.h, select-common.h, select-gtk.c, select-x.c, sgiplay.c, sound.c, sound.h, sunplay.c, sysfile.h, sysdep.c, syswindows.h, text.c, unexnt.c, win32.c, xgccache.c: Further work on types. This creates a full set of types for all
the basic semantics of `char' that I have so far identified, so that
its semantics can always be identified for the purposes of proper
Mule-safe code, and the raw use of `char' always avoided.
(1) More type renaming, for consistency of naming.
Char_ASCII -> Ascbyte
UChar_ASCII -> UAscbyte
Char_Binary -> CBinbyte
UChar_Binary -> Binbyte
SChar_Binary -> SBinbyte
(2) Introduce Rawbyte, CRawbyte, Boolbyte, Chbyte, UChbyte, and
Bitbyte and use them.
(3) New types Itext, Wexttext and Textcount for separating out
the concepts of bytes and textual units (different under UTF-16
and UTF-32, which are potential internal encodings).
(4) qxestr*_c -> qxestr*_ascii.
lisp.h: New; goes with other qxe() functions. #### Maybe goes in a
different section.
lisp.h: Group generic int-type defs together with EMACS_INT defs.
lisp.h: * lisp.h (WEXTTEXT_IS_WIDE)
New defns.
lisp.h: New type to replace places where int occurs as a boolean.
It's signed because occasionally people may want to use -1 as
an error value, and because unsigned ints are viral -- see comments
in the internals manual against using them.
dynarr.c: int -> Bytecount.
---------------------------
Mule-izing
---------------------------
device-x.c: Partially Mule-ize.
dumper.c, dumper.h: Mule-ize. Use Rawbyte. Use stderr_out not printf. Use wext_*().
sysdep.c, syswindows.h, text.c: New Wexttext API for manipulation of external text that may be
Unicode (e.g. startup code under Windows).
emacs.c: Mule-ize. Properly deal with argv in external encoding.
Use wext_*() and Wexttext. Use Rawbyte.
#if 0 some old junk on SCO that is unlikely to be correct.
Rewrite allocation code in run-temacs.
emacs.c, symsinit.h, win32.c: Rename win32 init function and call it even earlier, to
initialize mswindows_9x_p even earlier, for use in startup code
(XEUNICODE_P).
process.c: Use _wenviron not environ under Windows, to get Unicode environment
variables.
event-Xt.c: Mule-ize drag-n-drop related stuff.
dragdrop.c, dragdrop.h, frame-x.c: Mule-ize.
text.h: Add some more stand-in defines for particular kinds of conversion;
use in Mule-ization work in frame-x.c etc.
---------------------------
Freshening
---------------------------
intl-auto-encap-win32.c, intl-auto-encap-win32.h: Regenerate.
---------------------------
Unicode-work
---------------------------
intl-win32.c, syswindows.h: Factor out common options to MultiByteToWideChar and
WideCharToMultiByte. Add convert_unicode_to_multibyte_malloc()
and convert_unicode_to_multibyte_dynarr() and use. Add stuff for
alloca() conversion of multibyte/unicode.
alloc.c: Use dfc_external_data_len() in case of unicode coding system.
alloc.c, mule-charset.c: Don't zero out and reinit charset Unicode tables. This fucks up
dump-time loading. Anyway, either we load them at dump time or
run time, never both.
unicode.c: Dump the blank tables as well.
---------------------------------------------------------------
DOCUMENTATION, MOSTLY MULE-RELATED:
---------------------------------------------------------------
EmacsFrame.c, emodules.c, event-Xt.c, fileio.c, input-method-xlib.c, mule-wnnfns.c, redisplay-gtk.c, redisplay-tty.c, redisplay-x.c, regex.c, sysdep.c: Add comment about Mule work needed.
text.h: Add more documentation describing why DFC routines were not written
to return their value. Add some other DFC documentation.
console-msw.c, console-msw.h: Add pointer to docs in win32.c.
emacs.c: Add comments on sources of doc info.
text.c, charset.h, unicode.c, intl-win32.c, intl-encap-win32.c, text.h, file-coding.c, mule-coding.c: Collect background comments and related to text matters and
internationalization, and proposals for work to be done, in text.c
or Internals manual, stuff related to specific textual API's in
text.h, and stuff related to internal implementation of Unicode
conversion in unicode.c. Put lots of pointers to the comments to
make them easier to find.
s/mingw32.h, s/win32-common.h, s/win32-native.h, s/windowsnt.h, win32.c: Add bunches of new documentation on the different kinds of
builds and environments under Windows and how they work.
Collect this info in win32.c. Add pointers to these docs in
the relevant s/* files.
emacs.c: Document places with long comments.
Remove comment about exiting, move to internals manual, put
in pointer.
event-stream.c: Move docs about event queues and focus to internals manual, put
in pointer.
events.h: Move docs about event stream callbacks to internals manual, put
in pointer.
profile.c, redisplay.c, signal.c: Move documentation to the Internals manual.
process-nt.c: Add pointer to comment in win32-native.el.
lisp.h: Add comments about some comment conventions.
lisp.h: Add comment about the second argument.
device-msw.c, redisplay-msw.c: @@#### comments are out-of-date.
---------------------------------------------------------------
PDUMP WORK (MOTIVATED BY UNICODE CHANGES)
---------------------------------------------------------------
alloc.c, buffer.c, bytecode.c, console-impl.h, console.c, device.c, dumper.c, lrecord.h, elhash.c, emodules.h, events.c, extents.c, frame.c, glyphs.c, glyphs.h, mule-charset.c, mule-coding.c, objects.c, profile.c, rangetab.c, redisplay.c, specifier.c, specifier.h, window.c, lstream.c, file-coding.h, file-coding.c: PDUMP:
Properly implement dump_add_root_block(), which never worked before,
and is necessary for dumping Unicode tables.
Pdump name changes for accuracy:
XD_STRUCT_PTR -> XD_BLOCK_PTR.
XD_STRUCT_ARRAY -> XD_BLOCK_ARRAY.
XD_C_STRING -> XD_ASCII_STRING.
*_structure_* -> *_block_*.
lrecord.h: some comments added about
dump_add_root_block() vs dump_add_root_block_ptr().
extents.c: remove incorrect comment about pdump problems with gap array.
---------------------------------------------------------------
ALLOCATION
---------------------------------------------------------------
abbrev.c, alloc.c, bytecode.c, casefiddle.c, device-msw.c, device-x.c, dired-msw.c, doc.c, doprnt.c, dragdrop.c, editfns.c, emodules.c, file-coding.c, fileio.c, filelock.c, fns.c, glyphs-eimage.c, glyphs-gtk.c, glyphs-msw.c, glyphs-x.c, gui-msw.c, gui-x.c, imgproc.c, intl-win32.c, lread.c, menubar-gtk.c, menubar.c, nt.c, objects-msw.c, objects-x.c, print.c, process-nt.c, process-unix.c, process.c, realpath.c, redisplay.c, search.c, select-common.c, symbols.c, sysdep.c, syswindows.h, text.c, text.h, ui-byhand.c: New macros {alloca,xnew}_{itext,{i,ext,raw,bin,asc}bytes} for
more convenient allocation of these commonly requested items.
Modify functions to use alloca_ibytes, alloca_array, alloca_extbytes,
xnew_ibytes, etc. also XREALLOC_ARRAY, xnew.
alloc.c: Rewrite the allocation functions to factor out repeated code.
Add assertions for freeing dumped data.
lisp.h: Moved down and consolidated with other allocation stuff.
lisp.h, dynarr.c: New functions for allocation that's very efficient when mostly in
LIFO order.
lisp.h, text.c, text.h: Factor out some stuff for general use by alloca()-conversion funs.
text.h, lisp.h: Fill out convenience routines for allocating various kinds of
bytes and put them in lisp.h. Use them in place of xmalloc(),
ALLOCA().
text.h: Fill out the convenience functions so the _MALLOC() kinds match
the alloca() kinds.
---------------------------------------------------------------
ERROR-CHECKING
---------------------------------------------------------------
text.h: Create ASSERT_ASCTEXT_ASCII() and ASSERT_ASCTEXT_ASCII_LEN()
from similar Eistring checkers and change the Eistring checkers to
use them instead.
---------------------------------------------------------------
MACROS IN LISP.H
---------------------------------------------------------------
lisp.h: Redo GCPRO declarations. Create a "base" set of functions that can
be used to generate any kind of gcpro sets -- regular, ngcpro,
nngcpro, private ones used in GC_EXTERNAL_LIST_LOOP_2.
buffer.c, callint.c, chartab.c, console-msw.c, device-x.c, dialog-msw.c, dired.c, extents.c, ui-gtk.c, rangetab.c, nt.c, mule-coding.c, minibuf.c, menubar-msw.c, menubar.c, menubar-gtk.c, lread.c, lisp.h, gutter.c, glyphs.c, glyphs-widget.c, fns.c, fileio.c, file-coding.c, specifier.c: Eliminate EXTERNAL_LIST_LOOP, which does not check for circularities.
Use EXTERNAL_LIST_LOOP_2 instead or EXTERNAL_LIST_LOOP_3
or EXTERNAL_PROPERTY_LIST_LOOP_3 or GC_EXTERNAL_LIST_LOOP_2
(new macro). Removed/redid comments on EXTERNAL_LIST_LOOP.
---------------------------------------------------------------
SPACING FIXES
---------------------------------------------------------------
callint.c, hftctl.c, number-gmp.c, process-unix.c: Spacing fixes.
---------------------------------------------------------------
FIX FOR GEOMETRY PROBLEM IN FIRST FRAME
---------------------------------------------------------------
unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by
release 1.5.12 of Cygwin].
toolbar.c: bug fix for problem of initial frame being 77 chars wide on Windows.
will be overridden by my other ws.
---------------------------------------------------------------
FIX FOR LEAKING PROCESS HANDLES:
---------------------------------------------------------------
process-nt.c: Fixes for leaking handles. Inspired by work done by Adrian Aichner
<adrian@xemacs.org>.
---------------------------------------------------------------
FIX FOR CYGWIN BUG (Unicode-related):
---------------------------------------------------------------
unicode.c: Add workaround for newlib bug in sscanf() [should be fixed by
release 1.5.12 of Cygwin].
---------------------------------------------------------------
WARNING FIXES:
---------------------------------------------------------------
console-stream.c: `reinit' is unused.
compiler.h, event-msw.c, frame-msw.c, intl-encap-win32.c, text.h: Add stuff to deal with ANSI-aliasing warnings I got.
regex.c: Gather includes together to avoid warning.
---------------------------------------------------------------
CHANGES TO INITIALIZATION ROUTINES:
---------------------------------------------------------------
buffer.c, emacs.c, console.c, debug.c, device-x.c, device.c, dragdrop.c, emodules.c, eval.c, event-Xt.c, event-gtk.c, event-msw.c, event-stream.c, event-tty.c, events.c, extents.c, faces.c, file-coding.c, fileio.c, font-lock.c, frame-msw.c, glyphs-widget.c, glyphs.c, gui-x.c, insdel.c, lread.c, lstream.c, menubar-gtk.c, menubar-x.c, minibuf.c, mule-wnnfns.c, objects-msw.c, objects.c, print.c, scrollbar-x.c, search.c, select-x.c, text.c, undo.c, unicode.c, window.c, symsinit.h: Call reinit_*() functions directly from emacs.c, for clarity.
Factor out some redundant init code. Move disallowed stuff
that had crept into vars_of_glyphs() into complex_vars_of_glyphs().
Call init_eval_semi_early() from eval.c not in the middle of
vars_of_() in emacs.c since there should be no order dependency
in the latter calls.
---------------------------------------------------------------
ARMAGEDDON:
---------------------------------------------------------------
alloc.c, emacs.c, lisp.h, print.c: Rename inhibit_non_essential_printing_operations to
inhibit_non_essential_conversion_operations.
text.c: Assert on !inhibit_non_essential_conversion_operations.
console-msw.c, print.c: Don't do conversion in SetConsoleTitle or FindWindow to avoid
problems during armageddon. Put #errors for NON_ASCII_INTERNAL_FORMAT
in places where problems would arise.
---------------------------------------------------------------
CHANGES TO THE BUILD PROCEDURE:
---------------------------------------------------------------
config.h.in, s/cxux.h, s/usg5-4-2.h, m/powerpc.h: Add comment about correct ordering of this file.
Rearrange everything to follow this -- put all #undefs together
and before the s&m files. Add undefs for HAVE_ALLOCA, C_ALLOCA,
BROKEN_ALLOCA_IN_FUNCTION_CALLS, STACK_DIRECTION. Remove unused
HAVE_STPCPY, HAVE_GETWD, HAVE_SETLOCALE.
m/gec63.h: Deleted; totally broken, not used at all, not in FSF.
m/7300.h, m/acorn.h, m/alliant-2800.h, m/alliant.h, m/altos.h, m/amdahl.h, m/apollo.h, m/att3b.h, m/aviion.h, m/celerity.h, m/clipper.h, m/cnvrgnt.h, m/convex.h, m/cydra5.h, m/delta.h, m/delta88k.h, m/dpx2.h, m/elxsi.h, m/ews4800r.h, m/gould.h, m/hp300bsd.h, m/hp800.h, m/hp9000s300.h, m/i860.h, m/ibmps2-aix.h, m/ibmrs6000.h, m/ibmrt-aix.h, m/ibmrt.h, m/intel386.h, m/iris4d.h, m/iris5d.h, m/iris6d.h, m/irist.h, m/isi-ov.h, m/luna88k.h, m/m68k.h, m/masscomp.h, m/mg1.h, m/mips-nec.h, m/mips-siemens.h, m/mips.h, m/news.h, m/nh3000.h, m/nh4000.h, m/ns32000.h, m/orion105.h, m/pfa50.h, m/plexus.h, m/pmax.h, m/powerpc.h, m/pyrmips.h, m/sequent-ptx.h, m/sequent.h, m/sgi-challenge.h, m/symmetry.h, m/tad68k.h, m/tahoe.h, m/targon31.h, m/tekxd88.h, m/template.h, m/tower32.h, m/tower32v3.h, m/ustation.h, m/vax.h, m/wicat.h, m/xps100.h: Delete C_ALLOCA, HAVE_ALLOCA, STACK_DIRECTION,
BROKEN_ALLOCA_IN_FUNCTION_CALLS. All of this is auto-detected.
When in doubt, I followed recent FSF sources, which also have
these things deleted.
author | ben |
---|---|
date | Thu, 04 Nov 2004 23:08:28 +0000 |
parents | ba4677f54a05 |
children | 6fa9919a9a0b |
comparison
equal
deleted
inserted
replaced
2366:2a392e0c390a | 2367:ecf1ebac70d8 |
---|---|
1 /* Header file for the buffer manipulation primitives. | 1 /* Header file for the buffer manipulation primitives. |
2 Copyright (C) 1985, 1986, 1992, 1993, 1994, 1995 | 2 Copyright (C) 1985, 1986, 1992, 1993, 1994, 1995 |
3 Free Software Foundation, Inc. | 3 Free Software Foundation, Inc. |
4 Copyright (C) 1995 Sun Microsystems, Inc. | 4 Copyright (C) 1995 Sun Microsystems, Inc. |
5 Copyright (C) 2001, 2002 Ben Wing. | 5 Copyright (C) 2001, 2002, 2004 Ben Wing. |
6 | 6 |
7 This file is part of XEmacs. | 7 This file is part of XEmacs. |
8 | 8 |
9 XEmacs is free software; you can redistribute it and/or modify it | 9 XEmacs is free software; you can redistribute it and/or modify it |
10 under the terms of the GNU General Public License as published by the | 10 under the terms of the GNU General Public License as published by the |
72 children pointers. This is not terribly hard, though, and the | 72 children pointers. This is not terribly hard, though, and the |
73 code to maintain this is just like the code already present in | 73 code to maintain this is just like the code already present in |
74 extent-parent and extent-children. | 74 extent-parent and extent-children. |
75 */ | 75 */ |
76 | 76 |
77 #define NUM_CACHED_POSITIONS 50 | |
78 #define NUM_MOVED_POSITIONS 10 | |
79 | |
77 struct buffer_text | 80 struct buffer_text |
78 { | 81 { |
79 Ibyte *beg; /* Actual address of buffer contents. */ | 82 Ibyte *beg; /* Actual address of buffer contents. */ |
80 Bytebpos gpt; /* Index of gap in buffer. */ | 83 Bytebpos gpt; /* Index of gap in buffer. */ |
84 Charbpos bufgpt; /* Equivalent as a Charbpos. */ | |
81 Bytebpos z; /* Index of end of buffer. */ | 85 Bytebpos z; /* Index of end of buffer. */ |
82 Charbpos bufz; /* Equivalent as a Charbpos. */ | 86 Charbpos bufz; /* Equivalent as a Charbpos. */ |
83 Bytecount gap_size;/* Size of buffer's gap */ | 87 Bytecount gap_size;/* Size of buffer's gap */ |
84 Bytecount end_gap_size;/* Size of buffer's end gap */ | 88 Bytecount end_gap_size;/* Size of buffer's end gap */ |
85 long modiff; /* This counts buffer-modification events | 89 long modiff; /* This counts buffer-modification events |
86 for this buffer. It is incremented for | 90 for this buffer. It is incremented for |
87 each such event, and never otherwise | 91 each such event, and never otherwise |
88 changed. */ | 92 changed. */ |
89 long save_modiff; /* Previous value of modiff, as of last | 93 long save_modiff; /* Previous value of modiff, as of last |
90 time buffer visited or saved a file. */ | 94 time buffer visited or saved a file. */ |
91 | 95 |
92 #ifdef MULE | 96 #ifdef MULE |
97 | |
98 #ifdef OLD_BYTE_CHAR | |
93 /* We keep track of a "known" region for very fast access. This | 99 /* We keep track of a "known" region for very fast access. This |
94 information is text-only so it goes here. We update this at each | 100 information is text-only so it goes here. We update this at each |
95 change to the buffer, so if it's entirely ASCII, these will always | 101 change to the buffer, so if it's entirely ASCII, these will always |
96 contain the minimum and maximum positions of the buffer. */ | 102 contain the minimum and maximum positions of the buffer. */ |
97 Charbpos mule_bufmin, mule_bufmax; | 103 Charbpos mule_bufmin, mule_bufmax; |
98 Bytebpos mule_bytmin, mule_bytmax; | 104 Bytebpos mule_bytmin, mule_bytmax; |
99 int mule_shifter, mule_three_p; | 105 int mule_shifter, mule_three_p; |
100 | 106 #endif |
101 /* And we also cache 16 positions for fairly fast access near those | 107 |
102 positions. */ | 108 /* And we also cache NUM_CACHED_POSITIONS positions for fairly fast |
103 Charbpos mule_charbpos_cache[16]; | 109 access near those positions. */ |
104 Bytebpos mule_bytebpos_cache[16]; | 110 Charbpos mule_charbpos_cache[NUM_CACHED_POSITIONS]; |
111 Bytebpos mule_bytebpos_cache[NUM_CACHED_POSITIONS]; | |
112 int next_cache_pos; | |
113 | |
114 Charbpos cached_charpos; | |
115 Bytebpos cached_bytepos; | |
105 | 116 |
106 /* True if all chars fit into one byte; | 117 /* True if all chars fit into one byte; |
107 == (format == FORMAT_8_BIT_FIXED || | 118 == (format == FORMAT_8_BIT_FIXED || |
108 (format == FORMAT_DEFAULT && num_ascii_chars == bufz - 1)) | 119 (format == FORMAT_DEFAULT && num_ascii_chars == bufz - 1)) |
109 kept around to speed up (slightly?) the byte-char conversion routines. */ | 120 kept around to speed up (slightly?) the byte-char conversion routines. */ |
115 /* Number of chars in buffer that would fit in an 16-bit-fixed buffer. */ | 126 /* Number of chars in buffer that would fit in an 16-bit-fixed buffer. */ |
116 Charcount num_16_bit_fixed_chars; | 127 Charcount num_16_bit_fixed_chars; |
117 | 128 |
118 /* Currently we only handle 8 bit fixed and default */ | 129 /* Currently we only handle 8 bit fixed and default */ |
119 Internal_Format format; | 130 Internal_Format format; |
120 #endif | 131 #endif /* MULE */ |
121 | 132 |
122 /* Similar to the above, we keep track of positions for which line | 133 /* Similar to the above, we keep track of positions for which line |
123 number has last been calculated. See line-number.c. */ | 134 number has last been calculated. See line-number.c. */ |
124 Lisp_Object line_number_cache; | 135 Lisp_Object line_number_cache; |
125 | 136 |
309 #define BUF_ZV(buf) ((buf)->bufzv + 0) | 320 #define BUF_ZV(buf) ((buf)->bufzv + 0) |
310 | 321 |
311 /* End of buffer. */ | 322 /* End of buffer. */ |
312 #define BYTE_BUF_Z(buf) ((buf)->text->z + 0) | 323 #define BYTE_BUF_Z(buf) ((buf)->text->z + 0) |
313 #define BUF_Z(buf) ((buf)->text->bufz + 0) | 324 #define BUF_Z(buf) ((buf)->text->bufz + 0) |
325 | |
326 /* Gap location. */ | |
327 #define BYTE_BUF_GPT(buf) ((buf)->text->gpt + 0) | |
328 #define BUF_GPT(buf) ((buf)->text->bufgpt + 0) | |
314 | 329 |
315 /* Point. */ | 330 /* Point. */ |
316 #define BYTE_BUF_PT(buf) ((buf)->pt + 0) | 331 #define BYTE_BUF_PT(buf) ((buf)->pt + 0) |
317 #define BUF_PT(buf) ((buf)->bufpt + 0) | 332 #define BUF_PT(buf) ((buf)->bufpt + 0) |
318 | 333 |
534 | 549 |
535 /*----------------------------------------------------------------------*/ | 550 /*----------------------------------------------------------------------*/ |
536 /* Converting between byte and character positions */ | 551 /* Converting between byte and character positions */ |
537 /*----------------------------------------------------------------------*/ | 552 /*----------------------------------------------------------------------*/ |
538 | 553 |
554 /* | |
555 | |
556 Info on Byte-Char conversion: | |
557 | |
558 (Info-goto-node "(internals)Byte-Char Position Conversion") | |
559 */ | |
560 | |
539 #ifdef MULE | 561 #ifdef MULE |
540 | |
541 /* The basic algorithm we use is to keep track of a known region of | |
542 characters in each buffer, all of which are of the same width. We keep | |
543 track of the boundaries of the region in both Charbpos and Bytebpos | |
544 coordinates and also keep track of the char width, which is 1 - 4 bytes. | |
545 If the position we're translating is not in the known region, then we | |
546 invoke a function to update the known region to surround the position in | |
547 question. This assumes locality of reference, which is usually the | |
548 case. | |
549 | |
550 Note that the function to update the known region can be simple or | |
551 complicated depending on how much information we cache. In addition to | |
552 the known region, we always cache the correct conversions for point, | |
553 BEGV, and ZV, and in addition to this we cache 16 positions where the | |
554 conversion is known. We only look in the cache or update it when we | |
555 need to move the known region more than a certain amount (currently 50 | |
556 chars), and then we throw away a "random" value and replace it with the | |
557 newly calculated value. | |
558 | |
559 Finally, we maintain an extra flag that tracks whether the buffer is | |
560 entirely ASCII, to speed up the conversions even more. This flag is | |
561 actually of dubious value because in an entirely-ASCII buffer the known | |
562 region will always span the entire buffer (in fact, we update the flag | |
563 based on this fact), and so all we're saving is a few machine cycles. | |
564 | |
565 A potentially smarter method than what we do with known regions and | |
566 cached positions would be to keep some sort of pseudo-extent layer over | |
567 the buffer; maybe keep track of the charbpos/bytebpos correspondence at the | |
568 beginning of each line, which would allow us to do a binary search over | |
569 the pseudo-extents to narrow things down to the correct line, at which | |
570 point you could use a linear movement method. This would also mesh well | |
571 with efficiently implementing a line-numbering scheme. However, you | |
572 have to weigh the amount of time spent updating the cache vs. the | |
573 savings that result from it. In reality, we modify the buffer far less | |
574 often than we access it, so a cache of this sort that provides | |
575 guaranteed LOG (N) performance (or perhaps N * LOG (N), if we set a | |
576 maximum on the cache size) would indeed be a win, particularly in very | |
577 large buffers. If we ever implement this, we should probably set a | |
578 reasonably high minimum below which we use the old method, because the | |
579 time spent updating the fancy cache would likely become dominant when | |
580 making buffer modifications in smaller buffers. | |
581 | |
582 Note also that we have to multiply or divide by the char width in order | |
583 to convert the positions. We do some tricks to avoid ever actually | |
584 having to do a multiply or divide, because that is typically an | |
585 expensive operation (esp. divide). Multiplying or dividing by 1, 2, or | |
586 4 can be implemented simply as a shift left or shift right, and we keep | |
587 track of a shifter value (0, 1, or 2) indicating how much to shift. | |
588 Multiplying by 3 can be implemented by doubling and then adding the | |
589 original value. Dividing by 3, alas, cannot be implemented in any | |
590 simple shift/subtract method, as far as I know; so we just do a table | |
591 lookup. For simplicity, we use a table of size 128K, which indexes the | |
592 "divide-by-3" values for the first 64K non-negative numbers. (Note that | |
593 we can increase the size up to 384K, i.e. indexing the first 192K | |
594 non-negative numbers, while still using shorts in the array.) This also | |
595 means that the size of the known region can be at most 64K for | |
596 width-three characters. | |
597 | |
598 !!#### We should investigate the algorithm in GNU Emacs. I think it | |
599 does something similar, but it may differ in some details, and it's | |
600 worth seeing if anything can be gleaned. | |
601 */ | |
602 | 562 |
603 Bytebpos charbpos_to_bytebpos_func (struct buffer *buf, Charbpos x); | 563 Bytebpos charbpos_to_bytebpos_func (struct buffer *buf, Charbpos x); |
604 Charbpos bytebpos_to_charbpos_func (struct buffer *buf, Bytebpos x); | 564 Charbpos bytebpos_to_charbpos_func (struct buffer *buf, Bytebpos x); |
605 extern short three_to_one_table[]; | 565 extern short three_to_one_table[]; |
606 | 566 |
621 retval = (Bytebpos) x; | 581 retval = (Bytebpos) x; |
622 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) | 582 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) |
623 retval = (Bytebpos) (x << 1); | 583 retval = (Bytebpos) (x << 1); |
624 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) | 584 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) |
625 retval = (Bytebpos) (x << 2); | 585 retval = (Bytebpos) (x << 2); |
586 #ifdef OLD_BYTE_CHAR | |
626 else if (x >= buf->text->mule_bufmin && x <= buf->text->mule_bufmax) | 587 else if (x >= buf->text->mule_bufmin && x <= buf->text->mule_bufmax) |
627 retval = (buf->text->mule_bytmin + | 588 retval = (buf->text->mule_bytmin + |
628 ((x - buf->text->mule_bufmin) << buf->text->mule_shifter) + | 589 ((x - buf->text->mule_bufmin) << buf->text->mule_shifter) + |
629 (buf->text->mule_three_p ? (x - buf->text->mule_bufmin) : | 590 (buf->text->mule_three_p ? (x - buf->text->mule_bufmin) : |
630 (Bytebpos) 0)); | 591 (Bytebpos) 0)); |
592 #endif /* OLD_BYTE_CHAR */ | |
631 else | 593 else |
632 retval = charbpos_to_bytebpos_func (buf, x); | 594 retval = charbpos_to_bytebpos_func (buf, x); |
633 #else | 595 #else |
634 retval = (Bytebpos) x; | 596 retval = (Bytebpos) x; |
635 #endif | 597 #endif |
652 retval = (Charbpos) x; | 614 retval = (Charbpos) x; |
653 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) | 615 else if (BUF_FORMAT (buf) == FORMAT_16_BIT_FIXED) |
654 retval = (Charbpos) (x >> 1); | 616 retval = (Charbpos) (x >> 1); |
655 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) | 617 else if (BUF_FORMAT (buf) == FORMAT_32_BIT_FIXED) |
656 retval = (Charbpos) (x >> 2); | 618 retval = (Charbpos) (x >> 2); |
619 #ifdef OLD_BYTE_CHAR | |
657 else if (x >= buf->text->mule_bytmin && x <= buf->text->mule_bytmax) | 620 else if (x >= buf->text->mule_bytmin && x <= buf->text->mule_bytmax) |
658 retval = (buf->text->mule_bufmin + | 621 retval = (buf->text->mule_bufmin + |
659 ((buf->text->mule_three_p | 622 ((buf->text->mule_three_p |
660 ? three_to_one_table[x - buf->text->mule_bytmin] | 623 ? three_to_one_table[x - buf->text->mule_bytmin] |
661 : (x - buf->text->mule_bytmin) >> buf->text->mule_shifter))); | 624 : (x - buf->text->mule_bytmin) >> buf->text->mule_shifter))); |
625 #endif /* OLD_BYTE_CHAR */ | |
662 else | 626 else |
663 retval = bytebpos_to_charbpos_func (buf, x); | 627 retval = bytebpos_to_charbpos_func (buf, x); |
664 #else | 628 #else |
665 retval = (Charbpos) x; | 629 retval = (Charbpos) x; |
666 #endif | 630 #endif |
1069 Currently there will be at most two iterations in the loop, but it is | 1033 Currently there will be at most two iterations in the loop, but it is |
1070 written in such a way that it will still work if the buffer | 1034 written in such a way that it will still work if the buffer |
1071 representation is changed to have multiple gaps in it. | 1035 representation is changed to have multiple gaps in it. |
1072 */ | 1036 */ |
1073 | 1037 |
1074 | |
1075 /* Return the maximum position in the buffer it is safe to scan forwards | 1038 /* Return the maximum position in the buffer it is safe to scan forwards |
1076 past N to. This is used to prevent buffer scans from running into | 1039 past N to. This is used to prevent buffer scans from running into |
1077 the gap (e.g. search.c). All characters between N and CEILING_OF(N) | 1040 the gap (e.g. search.c). All characters between N and CEILING_OF(N) |
1078 are located contiguous in memory. Note that the character *at* | 1041 are located contiguous in memory. Note that the character *at* |
1079 CEILING_OF(N) is not contiguous in memory. */ | 1042 CEILING_OF(N) is not contiguous in memory. */ |
1080 #define BYTE_BUF_CEILING_OF(b, n) \ | 1043 #define BYTE_BUF_CEILING_OF(b, n) \ |
1081 ((n) < (b)->text->gpt && (b)->text->gpt < BYTE_BUF_ZV (b) ? \ | 1044 ((n) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < BYTE_BUF_ZV (b) ? \ |
1082 (b)->text->gpt : BYTE_BUF_ZV (b)) | 1045 BYTE_BUF_GPT (b) : BYTE_BUF_ZV (b)) |
1083 #define BUF_CEILING_OF(b, n) \ | 1046 #define BUF_CEILING_OF(b, n) \ |
1084 bytebpos_to_charbpos (b, BYTE_BUF_CEILING_OF (b, charbpos_to_bytebpos (b, n))) | 1047 ((n) < BUF_GPT (b) && BUF_GPT (b) < BUF_ZV (b) ? \ |
1048 BUF_GPT (b) : BUF_ZV (b)) | |
1085 | 1049 |
1086 /* Return the minimum position in the buffer it is safe to scan backwards | 1050 /* Return the minimum position in the buffer it is safe to scan backwards |
1087 past N to. All characters between FLOOR_OF(N) and N are located | 1051 past N to. All characters between FLOOR_OF(N) and N are located |
1088 contiguous in memory. Note that the character *at* N may not be | 1052 contiguous in memory. Note that the character *at* N may not be |
1089 contiguous in memory. */ | 1053 contiguous in memory. */ |
1090 #define BYTE_BUF_FLOOR_OF(b, n) \ | 1054 #define BYTE_BUF_FLOOR_OF(b, n) \ |
1091 (BYTE_BUF_BEGV (b) < (b)->text->gpt && (b)->text->gpt < (n) ? \ | 1055 (BYTE_BUF_BEGV (b) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < (n) ? \ |
1092 (b)->text->gpt : BYTE_BUF_BEGV (b)) | 1056 BYTE_BUF_GPT (b) : BYTE_BUF_BEGV (b)) |
1093 #define BUF_FLOOR_OF(b, n) \ | 1057 #define BUF_FLOOR_OF(b, n) \ |
1094 bytebpos_to_charbpos (b, BYTE_BUF_FLOOR_OF (b, charbpos_to_bytebpos (b, n))) | 1058 (BUF_BEGV (b) < BUF_GPT (b) && BUF_GPT (b) < (n) ? \ |
1059 BUF_GPT (b) : BUF_BEGV (b)) | |
1095 | 1060 |
1096 #define BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ | 1061 #define BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ |
1097 ((n) < (b)->text->gpt && (b)->text->gpt < BYTE_BUF_Z (b) ? \ | 1062 ((n) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < BYTE_BUF_Z (b) ? \ |
1098 (b)->text->gpt : BYTE_BUF_Z (b)) | 1063 BYTE_BUF_GPT (b) : BYTE_BUF_Z (b)) |
1099 #define BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ | 1064 #define BUF_CEILING_OF_IGNORE_ACCESSIBLE(b, n) \ |
1100 bytebpos_to_charbpos \ | 1065 ((n) < BUF_GPT (b) && BUF_GPT (b) < BUF_Z (b) ? \ |
1101 (b, BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (b, charbpos_to_bytebpos (b, n))) | 1066 BUF_GPT (b) : BUF_Z (b)) |
1102 | 1067 |
1103 #define BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ | 1068 #define BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ |
1104 (BYTE_BUF_BEG (b) < (b)->text->gpt && (b)->text->gpt < (n) ? \ | 1069 (BYTE_BUF_BEG (b) < BYTE_BUF_GPT (b) && BYTE_BUF_GPT (b) < (n) ? \ |
1105 (b)->text->gpt : BYTE_BUF_BEG (b)) | 1070 BYTE_BUF_GPT (b) : BYTE_BUF_BEG (b)) |
1106 #define BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ | 1071 #define BUF_FLOOR_OF_IGNORE_ACCESSIBLE(b, n) \ |
1107 bytebpos_to_charbpos \ | 1072 (BUF_BEG (b) < BUF_GPT (b) && BUF_GPT (b) < (n) ? \ |
1108 (b, BYTE_BUF_FLOOR_OF_IGNORE_ACCESSIBLE (b, charbpos_to_bytebpos (b, n))) | 1073 BUF_GPT (b) : BUF_BEG (b)) |
1109 | 1074 |
1110 /* Iterate over contiguous chunks of text in buffer BUF, starting at POS, | 1075 /* Iterate over contiguous chunks of text in buffer BUF, starting at POS, |
1111 of length LEN. Evaluates POS and LEN only once, but BUF multiply. In | 1076 of length LEN. Evaluates POS and LEN only once, but BUF multiply. In |
1112 each iteration, store the current chunk into RUNPTR/RUNLEN, which will | 1077 each iteration, store the current chunk into RUNPTR/RUNLEN, which will |
1113 be automatically declared (don't declare them yourself). This does not | 1078 be automatically declared (don't declare them yourself). This does not |
1115 limits respected, you need to impose them yourself. | 1080 limits respected, you need to impose them yourself. |
1116 | 1081 |
1117 NOTE: This must be surrounded with braces! */ | 1082 NOTE: This must be surrounded with braces! */ |
1118 | 1083 |
1119 #define BUFFER_TEXT_LOOP(buf, pos, len, runptr, runlen) \ | 1084 #define BUFFER_TEXT_LOOP(buf, pos, len, runptr, runlen) \ |
1120 Ibyte *runptr; \ | 1085 Ibyte *runptr; \ |
1121 Bytecount runlen; \ | 1086 Bytecount runlen; \ |
1122 Bytebpos BTL_pos = (pos); \ | 1087 Bytebpos BTL_pos = (pos); \ |
1123 Bytebpos BTL_len = (len); \ | 1088 Bytebpos BTL_len = (len); \ |
1124 for (runptr = BYTE_BUF_BYTE_ADDRESS (buf, BTL_pos), \ | 1089 for (runptr = BYTE_BUF_BYTE_ADDRESS (buf, BTL_pos), \ |
1125 runlen = BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (buf, BTL_pos) - BTL_pos, \ | 1090 runlen = BYTE_BUF_CEILING_OF_IGNORE_ACCESSIBLE (buf, BTL_pos) - BTL_pos, \ |