view src/README.kkcc @ 5776:65d65b52d608

Pass character count from coding systems to buffer insertion code. src/ChangeLog addition: 2014-01-16 Aidan Kehoe <kehoea@parhasard.net> Pass character count information from the no-conversion and unicode coding systems to the buffer insertion code, making #'find-file on large buffers a little snappier (if ERROR_CHECK_TEXT is not defined). * file-coding.c: * file-coding.c (coding_character_tell): New. * file-coding.c (conversion_coding_stream_description): New. * file-coding.c (no_conversion_convert): Update characters_seen when decoding. * file-coding.c (no_conversion_character_tell): New. * file-coding.c (lstream_type_create_file_coding): Create the no_conversion type with data. * file-coding.c (coding_system_type_create): Make the character_tell method available here. * file-coding.h: * file-coding.h (struct coding_system_methods): Add a new character_tell() method, passing charcount information from the coding systems to the buffer code, avoiding duplicate bytecount-to-charcount work especially with large buffers. * fileio.c (Finsert_file_contents_internal): Update this to pass charcount information to buffer_insert_string_1(), if that is available from the lstream code. * insdel.c: * insdel.c (buffer_insert_string_1): Add a new CCLEN argument, giving the character count of the string to insert. It can be -1 to indicate that te function should work it out itself using bytecount_to_charcount(), as it used to. * insdel.c (buffer_insert_raw_string_1): * insdel.c (buffer_insert_lisp_string_1): * insdel.c (buffer_insert_ascstring_1): * insdel.c (buffer_insert_emacs_char_1): * insdel.c (buffer_insert_from_buffer_1): * insdel.c (buffer_replace_char): Update these functions to use the new calling convention. * insdel.h: * insdel.h (buffer_insert_string): Update this header to reflect the new buffer_insert_string_1() argument. * lstream.c (Lstream_character_tell): New. Return the number of characters *read* and seen by the consumer so far, taking into account the unget buffer, and buffered reading. * lstream.c (Lstream_unread): Update unget_character_count here as appropriate. * lstream.c (Lstream_rewind): Reset unget_character_count here too. * lstream.h: * lstream.h (struct lstream): Provide the character_tell method, add a new field, unget_character_count, giving the number of characters ever passed to Lstream_unread(). Declare Lstream_character_tell(). Make Lstream_ungetc(), which happens to be unused, an inline function rather than a macro, in the course of updating it to modify unget_character_count. * print.c (output_string): Use the new argument to buffer_insert_string_1(). * tests.c: * tests.c (Ftest_character_tell): New test function. * tests.c (syms_of_tests): Make it available. * unicode.c: * unicode.c (struct unicode_coding_stream): * unicode.c (unicode_character_tell): New method. * unicode.c (unicode_convert): Update the character counter as appropriate. * unicode.c (coding_system_type_create_unicode): Make the character_tell method available.
author Aidan Kehoe <kehoea@parhasard.net>
date Thu, 16 Jan 2014 16:27:52 +0000
parents 3889ef128488
children
line wrap: on
line source

2002-07-17  Marcus Crestani  <crestani@informatik.uni-tuebingen.de>
	    Markus Kaltenbach  <makalten@informatik.uni-tuebingen.de>
	    Mike Sperber <mike@xemacs.org>

	updated 2003-07-29

	New KKCC-GC mark algorithm:
	configure flag : --use-kkcc

	For better understanding, first a few words about the mark algorithm 
	up to now:
	Every Lisp_Object has its own mark method, which calls mark_object
	with the stuff to be marked.
	Also, many Lisp_Objects have pdump descriptions memory_descriptions, 
	which are used by the portable dumper. The dumper gets all the 
	information it needs about the Lisp_Object from the descriptions.

	Also the garbage collector can use the information in the pdump
	descriptions, so we can get rid of the mark methods.
	That is what we have been doing.

	
	DUMPABLE FLAG
	-------------
	First we added a dumpable flag to lrecord_implementation. It shows,
	if the object is dumpable and should be processed by the dumper.
	The dumpable flag is the third argument of a lrecord_implementation
	definition (DEFINE_LRECORD_IMPLEMENTATION).
	If it is set to 1, the dumper processes the descriptions and dumps
	the Object, if it is set to 0, the dumper does not care about it.
		

	KKCC MARKING
	------------
	All Lisp_Objects have memory_descriptions now, so we could get
	rid of the mark_object calls.
	The KKCC algorithm manages its own stack. Instead of calling 
	mark_object, all the alive Lisp_Objects are pushed on the 
	kkcc_gc_stack. Then these elements on the stack  are processed 
	according to their descriptions.


	TODO
	----
	- For weakness use weak datatypes instead of XD_FLAG_NO_KKCC.
	  XD_FLAG_NO_KKCC occurs in:
		* elhash.c: htentry
		* extents.c: lispobject_gap_array, extent_list, extent_info
		* marker.c: marker     
	  Not everything has to be rewritten. See Ben's comment in lrecord.h.
	- Clean up special case marking (weak_hash_tables, weak_lists,
	  ephemerons).
	- Stack optimization (have one stack during runtime instead of 
	  malloc/free it for every garbage collect)

	There are a few Lisp_Objects, where there occurred differences and
	inexactness between the mark-method and the pdump description.  All
	these Lisp_Objects get dumped (except image instances), so their
	descriptions have been written, before we started our work:
	* alloc.c: string
	description: size_, data_, and plist is described
	mark: only plist is marked, but flush_cached_extent_info is called.
	      flush_cached_extent_info ->
		free_soe ->
		  free_extent_list ->
		    free_gap_array ->
		      gap_array_delete_all_markers ->
			Add gap_array to the gap_array_marker_freelist

	* glyphs.c: image_instance
	description: device is not set to nil
	mark: mark method sets device to nil if dead
	See comment above the description.