changeset 5238:2cc24c69446c

Document the new allocator and the new garbage collector in gc.c and mc-alloc.c.
author Marcus Crestani <crestani@informatik.uni-tuebingen.de>
date Mon, 05 Jul 2010 18:17:39 +0200
parents 6466bc9ebf15
children fca0cf0971de
files src/ChangeLog src/gc.c src/mc-alloc.c
diffstat 3 files changed, 545 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/src/ChangeLog	Wed Jun 23 08:04:18 2010 -0400
+++ b/src/ChangeLog	Mon Jul 05 18:17:39 2010 +0200
@@ -1,3 +1,9 @@
+2010-06-05  Marcus Crestani  <crestani@informatik.uni-tuebingen.de>
+
+	* gc.c:
+	* mc-alloc.c:
+	Document the new allocator and the new garbage collector.
+
 2010-06-21  Jeff Sparkes  <jsparkes@gmail.com>
 
 	* console-x-impl.h (DEVICE_X_XFTDRAW): Define, instead of
--- a/src/gc.c	Wed Jun 23 08:04:18 2010 -0400
+++ b/src/gc.c	Mon Jul 05 18:17:39 2010 +0200
@@ -21,6 +21,318 @@
 
 /* Synched up with: Not in FSF. */
 
+/* 
+   Garbage Collectors in XEmacs
+
+   Currently, XEmacs comes with two garbage collectors:
+
+   - The "old garbage collector": a simple mark and sweep collector,
+     its implementation is mainly spread out over gc.c and alloc.c.
+     It is used by the default configuration or if you configure
+     `--with-newgc=no'.
+
+   - The "new garbage collector": an incremental mark and sweep collector,
+     its implementation is in gc.c.  It is used if you configure
+     `--with-newgc'.  It comes with a new allocator, see mc-alloc.c, and
+     with the KKCC mark algorith, see below.
+
+   Additionally, the old garbage collectors comes with two mark algorithms:
+
+   - The "recursive mark algorithm" marks live objects by recursively
+     calling mark_* functions on live objects.  It is the default mark 
+     algorithm of the old garbage collector.
+
+   - The "KKCC mark algorithm" uses an explicit stack that to keep
+     track of the current progress of traversal and uses memory layout
+     descriptions (that are also used by the portable dumper) instead
+     of the mark_* functions.  The old garbage collector uses it if
+     you configure `--with-kkcc'.  It is the default and only mark
+     algorithm of the new garbage collector.
+
+
+   The New Incremental Garbage Collector
+
+   An incremental garbage collector keeps garbage collection pause
+   times short by interleaving small amounts of collection work with
+   program execution, it does that by instrumenting write barrier
+   algorithms that essentially allow interrupting the mark phase.
+
+
+   Write Barrier
+
+   A write barrier is the most important prerequisite for fancy
+   garbage collection techniques.  We implement a "Virtual Dirty Bit
+   (short: vdb) Write Barrier" that makes uses of the operating
+   system's memory-protection mechanisms: The write barrier
+   write-protects memory pages containing heap objects.  If the
+   mutator tries to modify these objects by writing into the
+   write-protected page, the operating system generates a fault.  The
+   write barrier catches this fault, reads out the error-causing
+   address and can thus identify the updated object and page.
+
+   Not all environments and operating systems provide the mechanism to
+   write-protect memory, catch resulting write faults, and read out
+   the faulting address.  But luckily, most of today's operating
+   systems provide the features needed for the write-barrier
+   implementation.  Currently, XEmacs includes write-barrier
+   implementations for the following platforms:
+
+   - POSIX-compliant platforms like up-to-date UNIX, Linux, Solaris,
+     etc. use the system call `mprotect' for memory protection,
+     `sigaction' for signal handling and get the faulting address from
+     `struct siginfo'.  See file vdb-posix.c.
+
+  - Mach-based systems like Mac OS X use "Mach Exception Handlers".
+    See file vdb-mach.c.
+
+  - Windows systems like native Windows and Cygwin use Microsoft's
+    so-called "Structured Exception Handling".  See file vdb-win32.c.
+ 
+  The configure script determines which write barrier implementation
+  to use for a system.  If no write barrier implementation is working
+  on that system, a fall-back "fake" implementation is used: This
+  implementation simply turns of the incremental write barrier at
+  runtime and does not allow any incremental collection (see
+  vdb-fake.c).  The garbage collector then acts like a traditional
+  mark-and-sweep garbage collector.  Generally, the incremental
+  garbage collector can be turned of at runtime by the user or by
+  applications, see below.
+   
+   
+  Memory Protection and Object Layout
+
+  Implementations of a memory-protection mechanism may restrict the
+  size and the alignment of the memory region to be on page-size
+  boundaries.  All objects subject to be covered by the write barrier
+  have to be allocated on logical memory pages, so that they meet the
+  requirement to be write-protected.  The new allocator mc-alloc is
+  aware of a system page size---it allocates all Lisp objects on
+  logical memory pages and is therefore defaulted to on when the new
+  garbage collector is enabled.
+
+  Unfortunately, the Lisp object layout that works with the old
+  collector leads to holes in the write barrier: Not all data
+  structures containing pointers to Lisp objects are allocated on the
+  Lisp heap.  Some Lisp objects do not carry all their information in
+  the object itself.  External parts are kept in separately allocated
+  memory blocks that are not managed by the new Lisp allocator.
+  Examples for these objects are hash tables and dynamic arrays, two
+  objects that can dynamically grow and shrink.  The separate memory
+  blocks are not guaranteed to reside on page boundaries, and thus
+  cannot be watched by the write barrier.
+
+  Moreover, the separate parts can contain live pointers to other Lisp
+  objects.  These pointers are not covered by the write barrier and
+  modifications by the client during garbage collection do escape.  In
+  this case, the client changes the connectivity of the reachability
+  graph behind the collector's back, which eventually leads to
+  erroneous collection of live objects.  To solve this problem, I
+  transformed the separately allocated parts to fully qualified Lisp
+  objects that are managed by the allocator and thus are covered by
+  the write barrier.  This also removes a lot of special allocation
+  and removal code for the out-sourced parts.  Generally, allocating
+  all data structures that contain pointers to Lisp objects on one
+  heap makes the whole memory layout more consistent.
+
+
+  Debugging
+
+  The virtual-dirty-bit write barrier provokes signals on purpose,
+  namely SIGSEGV and SIGBUS.  When debugging XEmacs with this write
+  barrier running, the debugger always breaks whenever a signal
+  occurs.  This behavior is generally desired: A debugger has to break
+  on signals, to allow the user to examine the cause of the
+  signal---especially for illegal memory access, which is a common
+  programming error.  But the debugger should not break for signals
+  caused by the write barrier.  Therefore, most debuggers provide the
+  ability to turn of their fault handling for specific signals.  The
+  configure script generates the debugger's settings .gdbinit and
+  .dbxrc, adding code to turn of signal handling for SIGSEGV and
+  SIGBUS, if the new garbage collector is used.
+
+  But what happens if a bug in XEmacs causes an illegal memory access?
+  To maintain basic debugging abilities, we use another signal: First,
+  the write-barrier signal handler has to determine if the current
+  error situation is caused by the write-barrier memory protection or
+  not.  Therefore, the signal handler checks if the faulting address
+  has been write-protected before.  If it has not, the fault is caused
+  by a bug; the debugger has to break in this situation.  To achieve
+  this, the signal handler raises SIGABRT to abort the program.  Since
+  SIGABRT is not masked out by the debugger, XEmacs aborts and allows
+  the user to examine the problem.
+
+
+  Incremental Garbage Collection
+
+  The new garbage collector is still a mark-and-sweep collector, but
+  now the mark phase no longer runs in one atomic action, it is
+  interleaved with program execution.  The incremental garbage
+  collector needs an explicit mark stack to store the state of the
+  incremental traversal: the KKCC mark algorithm is a prerequisite and
+  is enabled by default when the new garbage collector is on.
+
+  Garbage collection is invoked as before: After `gc-cons-threshold'
+  bytes have been allocated since the last garbage collection (or
+  after `gc-cons-percentage' percentage of the total amount of memory
+  used for Lisp data has been allocated since the last garbage
+  collection) a collection starts.  After some initialization, the
+  marking begins.
+
+  The variable `gc-incremental-traversal-threshold' contains how many
+  steps of incremental work have to be executed in one incremental
+  traversal cycle.  After that many steps have been made, the mark
+  phase is interrupted and the client resumes.  Now, the Lisp memory
+  is write-protected and the write barrier records modified objects.
+  Incremental traversal is resumed after
+  `gc-cons-incremental-threshold' bytes have been allocated since the
+  interruption of garbage collection.  Then, the objects recorded by
+  the write-barrier have to be re-examined by the traversal, i.e. they
+  are re-pushed onto the mark stack and processed again.  Once the
+  mark stack is empty, the traversal is done.
+
+  A full incremental collection is slightly slower than a full garbage
+  collection before: There is an overhead for storing pointers into
+  objects when the write barrier is running, and an overhead for
+  repeated traversal of modified objects.  However, the new
+  incremental garbage collector reduces client pause times to
+  one-third, so even when a garbage collection is running, XEmacs
+  stays reactive.
+
+
+  Tricolor Marking: White, Black, and Grey Mark Bits
+
+  Garbage collection traverses the graph of reachable objects and
+  colors them. The objects subject to garbage collection are white at
+  the beginning. By the end of the collection, those that will be
+  retained are colored black. When there are no reachable objects left
+  to blacken, the traversal of live data structures is finished. In
+  traditional mark-and-sweep collectors, this black and white coloring
+  is sufficient.
+
+  In an incremental collector, the intermediate state of the traversal
+  is im- portant because of ongoing mutator activity: the mutator
+  cannot be allowed to change things in such way that the collector
+  will fail to find all reachable objects. To understand and prevent
+  such interactions between the mutator and the collector, it is
+  useful to introduce a third color, grey.
+
+  Grey objects have been reached by the traversal, but its descendants
+  may not have been. White objects are changed to grey when they are
+  reached by the traversal. Grey objects mark the current state of the
+  traversal: traversal pro- ceeds by processing the grey objects. The
+  KKCC mark stack holds all the currently grey-colored objects.
+  Processing a grey object means following its outgoing pointers, and
+  coloring it black afterwards.
+
+  Intuitively, the traversal proceeds in a wavefront of grey objects
+  that separates the unreached objects, which are colored white, from
+  the already processed black objects.
+
+  The allocator takes care of storing the mark bits: The mark bits are
+  kept in a tree like structure, for details see mc-alloc.c.
+
+
+  Internal States of the Incremental Garbage Collector
+
+  To keep track of its current state, the collector holds it's current
+  phase in the global `gc_state' variable.  A collector phase is one
+  of the following:
+
+  NONE  No incremental or full collection is currently running.
+
+  INIT_GC  The collector prepares for a new collection, e.g. sets some
+    global variables.
+
+  PUSH_ROOT_SET  The collector pushes the root set on the mark stack 
+    to start the traversal of live objects.
+
+  MARK   The traversal of live objects colors the reachable objects
+    white, grey, or black, according to their lifeness.  The mark
+    phase can be interrupted by the incremental collection algorithm:
+    Before the client (i.e. the non collector part of XEmacs) resumes,
+    the write barrier has to be installed so that the collector knows
+    what objects get modified during the collector's pause.
+    Installing a write barrier means protecting pages that only
+    contain black objects and recording write access to these objects.
+    Pages with white or grey objects do not need to be protected,
+    since these pages are due to marking anyways when the collector
+    resumes.  Once the collector resumes, it has to re-scan all
+    objects that have been modified during the collector pause and
+    have been caught by the write barrier.  The mark phase is done when
+    there are no more grey objects on the heap, i.e. the KKCC mark stack
+    is empty.
+
+  REPUSH_ROOT_SET  After the mark phase is done, the collector has to 
+    traverse the root set pointers again, since modifications to the
+    objects in the root set can not all be covered by the write barrier
+    (e.g. root set objects that are on the call stack).  Therefore, the
+    collector has to traverse the root set again without interruption.
+
+  FINISH_MARK  After the mark phase is finished, some objects with
+    special liveness semantics have to be treated separately, e.g.
+    ephemerons and the various flavors of weak objects.
+
+  FINALIZE  The collector registers all objects that have finalizers
+    for finalization.  Finalizations happens asynchronously sometimes
+    after the collection has finished.
+
+  SWEEP  The allocator scans the entire heap and frees all white marked
+    objects. The freed memory is recycled and can be re-used for future
+    allocations. The sweep phase is carried out atomically.
+
+  FINISH_GC  The collector cleans up after the garbage collection by
+    resetting some global variables.
+
+
+  Lisp Interface
+
+  The new garbage collector can be accessed directly from Emacs Lisp.
+  Basically, two functions invoke the garbage collector:
+
+  (gc-full) starts a full garbage collection.  If an incremental
+    garbage collection is already running, it is finished without
+    further interruption.  This function guarantees that unused
+    objects have been freed when it returns.
+
+  (gc-incremental) starts an incremental garbage collection.  If an
+    incremental garbage collection is already running, the next cycle
+    of incremental traversal is started.  The garbage collection is
+    finished if the traversal completes.  Note that this function does
+    not necessarily free any memory.  It only guarantees that the
+    traversal of the heap makes progress.
+
+  The old garbage collector uses the function (garbage-collect) to
+  invoke a garbage collection.  This function is still in use by some
+  applications that explicitly want to invoke a garbage collection.
+  Since these applications may expect that unused memory has really
+  been freed when (garbage-collect) returns, it maps to (gc-full).
+
+  The new garbage collector is highly customizable during runtime; it
+  can even be switched back to the traditional mark-and-sweep garbage
+  collector: The variable allow-incremental-gc controls whether
+  garbage collections may be interrupted or if they have to be carried
+  out in one atomic action.  Setting allow-incremental-gc to nil
+  prevents incremental garbage collection, and the garbage collector
+  then only does full collects, even if (gc-incremental) is called.
+  Non-nil allows incremental garbage collection.
+
+  This way applications can freely decide what garbage collection
+  algorithm is best for the upcoming memory usage.  How frequently a
+  garbage collection occurs and how much traversal work is done in one
+  incremental cycle can also be modified during runtime.  See
+
+    M-x customize RET alloc RET
+
+  for an overview of all settings.
+
+
+  More Information
+
+  More details can be found in
+  http://crestani.de/xemacs/pdf/thesis-newgc.pdf .
+
+*/
+
 #include <config.h>
 #include "lisp.h"
 
@@ -50,8 +362,14 @@
 #include "vdb.h"
 
 
+/* Number of bytes of consing since gc before a full gc should happen. */
 #define GC_CONS_THRESHOLD                  2000000
+
+/* Number of bytes of consing since gc before another cycle of the gc
+   should happen in incremental mode. */
 #define GC_CONS_INCREMENTAL_THRESHOLD       200000
+
+/* Number of elements marked in one cycle of incremental GC. */
 #define GC_INCREMENTAL_TRAVERSAL_THRESHOLD  100000
 
 /* Number of bytes of consing done since the last GC. */
--- a/src/mc-alloc.c	Wed Jun 23 08:04:18 2010 -0400
+++ b/src/mc-alloc.c	Mon Jul 05 18:17:39 2010 +0200
@@ -21,6 +21,227 @@
 
 /* Synched up with: Not in FSF. */
 
+/* 
+   The New Allocator
+
+   The ideas and algorithms are based on the allocator of the
+   Boehm-Demers-Weiser conservative garbage collector. See
+   http://www.hpl.hp.com/personal/Hans_ Boehm/gc/index.html.
+
+   The new allocator is enabled when the new garbage collector
+   is enabled (with `--with-newgc').  The implementation of
+   the new garbage collector is in gc.c.
+
+   The new allocator takes care of:
+   - allocating objects in a write-barrier-friendly way
+   - manage object's mark bits 
+
+   Three-Level Allocation
+
+   The new allocator efficiently manages the allocation of Lisp
+   objects by minimizing the number of times malloc() and free() are
+   called. The allocation process has three layers of abstraction:
+
+   1. It allocates memory in very large chunks called heap sections. 
+   
+   2. The heap sections are subdivided into pages. The page size is
+      determined by the constant PAGE_SIZE. It holds the size of a page
+      in bytes.
+
+   3. One page consists of one or more cells. Each cell represents
+      a memory location for an object. The cells on one page all have
+      the same size, thus every page only contains equal-sized
+      objects.
+
+   If an object is bigger than page size, it is allocated on a
+   multi-page. Then there is only one cell on a multi-page (the cell
+   covers the full multi-page). Is an object smaller than 1/2 PAGE_SIZE,
+   a page contains several objects and several cells. There
+   is only one cell on a page for object sizes from 1/2 PAGE_SIZE to
+   PAGE_SIZE (whereas multi-pages always contain 2 only one
+   cell). Only in layer one malloc() and free() are called.
+
+
+   Size Classes and Page Lists
+
+   Meta-information about every page and multi-page is kept in a page
+   header. The page header contains some bookkeeping information like
+   number of used and free cells, and pointers to other page
+   headers. The page headers are linked in a page list.
+
+   Every page list builds a size class. A size class contains all
+   pages (linked via page headers) for objects of the same size. The
+   new allocator does not group objects based on their type, it groups
+   objects based on their sizes.
+
+   Here is an example: A cons contains a lrecord_header, a car and cdr
+   field. Altogether it uses 12 bytes of memory (on 32 bits
+   machines). All conses are allocated on pages with a cell size of 12
+   bytes. All theses pages are kept together in a page list, which
+   represents the size class for 12 bytes objects. But this size class
+   is not exclusively for conses only. Other objects, which are also
+   12 bytes big (e.g. weak-boxes), are allocated in the same size
+   class and on the same pages.
+
+   The number of size classes is customizable, so is the size step
+   between successive size classes.
+
+
+   Used and Unused Heap
+
+   The memory which is managed by the allocator can be divided in two
+   logical parts:
+
+   The used heap contains pages, on which objects are allocated. These
+   pages are com- pletely or partially occupied. In the used heap, it
+   is important to quickly find a free spot for a new
+   object. Therefore the size classes of the used heap are defined by
+   the size of the cells on the pages. The size classes should match
+   common object sizes, to avoid wasting memory.
+
+   The unused heap only contains completely empty pages. They have
+   never been used or have been freed completely again. In the unused
+   heap, the size of consecutive memory tips the scales. A page is the
+   smallest entity which is asked for. Therefore, the size classes of
+   the unused heap are defined by the number of consecutive pages.
+
+   The parameters for the different size classes can be adjusted
+   independently, see `configurable values' below.
+
+
+   The Allocator's Data Structures
+
+   The struct `mc_allocator_globals holds' all the data structures
+   that the new allocator uses (lists of used and unused pages, mark
+   bits, etc.).
+
+
+   Mapping of Heap Pointers to Page Headers
+
+   For caching benefits, the page headers and mark bits are stored
+   separately from their associated page. During garbage collection
+   (i.e. for marking and freeing objects) it is important to identify
+   the page header which is responsible for a given Lisp object.
+
+   To do this task quickly, I added a two level search tree: the upper
+   10 bits of the heap pointer are the index of the first level. This
+   entry of the first level links to the second level, where the next
+   10 bits of the heap pointer are used to identify the page
+   header. The remaining bits point to the object relative to the
+   page.
+
+   On architectures with more than 32 bits pointers, a hash value of
+   the upper bits is used to index into the first level.
+
+   
+   Mark Bits
+
+   For caching purposes, the mark bits are no longer kept within the
+   objects, they are kept in a separate bit field.
+
+   Every page header has a field for the mark bits of the objects on
+   the page. If there are less cells on the page than there fit bits
+   in the integral data type EMACS_INT, the mark bits are stored
+   directly in this EMACS_INT.
+
+   Otherwise, the mark bits are written in a separate space, with the
+   page header pointing to this space. This happens to pages with
+   rather small objects: many cells fit on a page, thus many mark bits
+   are needed.
+
+
+   Allocate Memory
+
+   Use
+      void *mc_alloc (size_t size) 
+   to request memory from the allocator.  This returns a pointer to a
+   newly allocated block of memory of given size.
+
+   This is how the new allocator allocates memory: 
+   1. Determine the size class of the object. 
+   2. Is there already a page in this size class and is there a free
+      cell on this page?
+      * YES 
+        3. Unlink free cell from free list, return address of free cell. 
+        DONE.
+      * NO 
+        3. Is there a page in the unused heap?
+	* YES 
+          4. Move unused page to used heap. 
+          5. Initialize page header, free list, and mark bits. 
+          6. Unlink first cell from free list, return address of cell. 
+          DONE.
+	* NO 
+          4. Expand the heap, add new memory to unused heap 
+             [go back to 3. and proceed with the YES case].
+
+  The allocator puts partially filled pages to the front of the page
+  list, completely filled ones to the end. That guarantees a fast
+  terminating search for free cells. Are there two successive full
+  pages at the front of the page list, the complete size class is
+  full, a new page has to be added.
+
+
+  Expand Heap
+
+  To expand the heap, a big chunk of contiguous memory is allocated
+  using malloc(). These pieces are called heap sections. How big a new
+  heap section is (and thus the growth of the heap) is adjustable:  See
+  MIN_HEAP_INCREASE, MAX_HEAP_INCREASE, and HEAP_GROWTH_DIVISOR below.
+
+
+  Free Memory
+
+  One optimization in XEmacs is that locally used Lisp objects are
+  freed manually (the memory is not wasted till the next garbage
+  collection). Therefore the new allocator provides this function:
+    void mc_free (void *ptr) 
+  That frees the object pointed to by ptr.
+
+  This function is also used internally during sweep phase of the
+  garbage collection.  This is how it works in detail:
+
+  1. Use pointer to identify page header 
+     (use lookup mechanism described above).
+  2. Mark cell as free and hook it into free list. 
+  3. Is the page completely empty?
+     * YES 
+       4. Unlink page from page list. 
+       5. Remove page header, free list, and mark bits. 
+       6. Move page to unused heap.
+     * NO 
+       4. Move page to front of size class (to speed up allocation 
+          of objects).
+
+  If the last object of a page is freed, the empty page is returned to
+  the unused heap. The allocator tries to coalesce adjacent pages, to
+  gain a big piece of contiguous memory. The resulting chunk is hooked
+  into the according size class of the unused heap. If this created a
+  complete heap section, the heap section is returned to the operating
+  system by using free().
+
+
+  Allocator and Garbage Collector
+
+  The new allocator simplifies the interface to the Garbage Collector:
+  * mark live objects: MARK_[WHITE|GREY|BLACK] (ptr)
+  * sweep heap: EMACS_INT mc_sweep (void)
+  * run finalizers: EMACS_INT mc_finalize (void)
+
+
+  Allocator and Dumper
+
+  The new allocator provides special finalization for the portable
+  dumper (to save disk space): EMACS_INT mc_finalize_for_disksave (void)
+
+
+  More Information
+
+  More details can be found in
+  http://crestani.de/xemacs/pdf/mc-alloc.pdf .
+  
+*/
+
 #include <config.h>
 
 #include "lisp.h"