PageRenderTime 53ms CodeModel.GetById 25ms RepoModel.GetById 0ms app.codeStats 1ms

/libgc/doc/gcdescr.html

https://bitbucket.org/danipen/mono
HTML | 560 lines | 542 code | 18 blank | 0 comment | 0 complexity | 53418291720710e859ee3a0c248b07d7 MD5 | raw file
Possible License(s): Unlicense, Apache-2.0, LGPL-2.0, MPL-2.0-no-copyleft-exception, CC-BY-SA-3.0, GPL-2.0
  1. <HTML>
  2. <HEAD>
  3. <TITLE> Conservative GC Algorithmic Overview </TITLE>
  4. <AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author>
  5. </HEAD>
  6. <BODY>
  7. <H1> <I>This is under construction, and may always be.</i> </h1>
  8. <H1> Conservative GC Algorithmic Overview </h1>
  9. <P>
  10. This is a description of the algorithms and data structures used in our
  11. conservative garbage collector. I expect the level of detail to increase
  12. with time. For a survey of GC algorithms, see for example
  13. <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
  14. excellent paper</a>. For an overview of the collector interface,
  15. see <A HREF="gcinterface.html">here</a>.
  16. <P>
  17. This description is targeted primarily at someone trying to understand the
  18. source code. It specifically refers to variable and function names.
  19. It may also be useful for understanding the algorithms at a higher level.
  20. <P>
  21. The description here assumes that the collector is used in default mode.
  22. In particular, we assume that it used as a garbage collector, and not just
  23. a leak detector. We initially assume that it is used in stop-the-world,
  24. non-incremental mode, though the presence of the incremental collector
  25. will be apparent in the design.
  26. We assume the default finalization model, but the code affected by that
  27. is very localized.
  28. <H2> Introduction </h2>
  29. The garbage collector uses a modified mark-sweep algorithm. Conceptually
  30. it operates roughly in four phases, which are performed occasionally
  31. as part of a memory allocation:
  32. <OL>
  33. <LI>
  34. <I>Preparation</i> Each object has an associated mark bit.
  35. Clear all mark bits, indicating that all objects
  36. are potentially unreachable.
  37. <LI>
  38. <I>Mark phase</i> Marks all objects that can be reachable via chains of
  39. pointers from variables. Often the collector has no real information
  40. about the location of pointer variables in the heap, so it
  41. views all static data areas, stacks and registers as potentially containing
  42. pointers. Any bit patterns that represent addresses inside
  43. heap objects managed by the collector are viewed as pointers.
  44. Unless the client program has made heap object layout information
  45. available to the collector, any heap objects found to be reachable from
  46. variables are again scanned similarly.
  47. <LI>
  48. <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
  49. objects, and returns them to an appropriate free list for reuse. This is
  50. not really a separate phase; even in non incremental mode this is operation
  51. is usually performed on demand during an allocation that discovers an empty
  52. free list. Thus the sweep phase is very unlikely to touch a page that
  53. would not have been touched shortly thereafter anyway.
  54. <LI>
  55. <I>Finalization phase</i> Unreachable objects which had been registered
  56. for finalization are enqueued for finalization outside the collector.
  57. </ol>
  58. <P>
  59. The remaining sections describe the memory allocation data structures,
  60. and then the last 3 collection phases in more detail. We conclude by
  61. outlining some of the additional features implemented in the collector.
  62. <H2>Allocation</h2>
  63. The collector includes its own memory allocator. The allocator obtains
  64. memory from the system in a platform-dependent way. Under UNIX, it
  65. uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
  66. <P>
  67. Most static data used by the allocator, as well as that needed by the
  68. rest of the garbage collector is stored inside the
  69. <TT>_GC_arrays</tt> structure.
  70. This allows the garbage collector to easily ignore the collectors own
  71. data structures when it searches for root pointers. Other allocator
  72. and collector internal data structures are allocated dynamically
  73. with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
  74. allow for deallocation, and is therefore used only for permanent data
  75. structures.
  76. <P>
  77. The allocator allocates objects of different <I>kinds</i>.
  78. Different kinds are handled somewhat differently by certain parts
  79. of the garbage collector. Certain kinds are scanned for pointers,
  80. others are not. Some may have per-object type descriptors that
  81. determine pointer locations. Or a specific kind may correspond
  82. to one specific object layout. Two built-in kinds are uncollectable.
  83. One (<TT>STUBBORN</tt>) is immutable without special precautions.
  84. In spite of that, it is very likely that most C clients of the
  85. collector currently
  86. use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
  87. The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes
  88. heavy use of a kind (allocated with GC_gcj_malloc) that stores
  89. type information at a known offset in method tables.
  90. <P>
  91. The collector uses a two level allocator. A large block is defined to
  92. be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
  93. typically on the order of the page size.
  94. <P>
  95. Large block sizes are rounded up to
  96. the next multiple of <TT>HBLKSIZE</tt> and then allocated by
  97. <TT>GC_allochblk</tt>. Recent versions of the collector
  98. use an approximate best fit algorithm by keeping free lists for
  99. several large block sizes.
  100. The actual
  101. implementation of <TT>GC_allochblk</tt>
  102. is significantly complicated by black-listing issues
  103. (see below).
  104. <P>
  105. Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
  106. Each chunk is
  107. dedicated to only one object size and kind. The allocator maintains
  108. separate free lists for each size and kind of object.
  109. <P>
  110. Once a large block is split for use in smaller objects, it can only
  111. be used for objects of that size, unless the collector discovers a completely
  112. empty chunk. Completely empty chunks are restored to the appropriate
  113. large block free list.
  114. <P>
  115. In order to avoid allocating blocks for too many distinct object sizes,
  116. the collector normally does not directly allocate objects of every possible
  117. request size. Instead request are rounded up to one of a smaller number
  118. of allocated sizes, for which free lists are maintained. The exact
  119. allocated sizes are computed on demand, but subject to the constraint
  120. that they increase roughly in geometric progression. Thus objects
  121. requested early in the execution are likely to be allocated with exactly
  122. the requested size, subject to alignment constraints.
  123. See <TT>GC_init_size_map</tt> for details.
  124. <P>
  125. The actual size rounding operation during small object allocation is
  126. implemented as a table lookup in <TT>GC_size_map</tt>.
  127. <P>
  128. Both collector initialization and computation of allocated sizes are
  129. handled carefully so that they do not slow down the small object fast
  130. allocation path. An attempt to allocate before the collector is initialized,
  131. or before the appropriate <TT>GC_size_map</tt> entry is computed,
  132. will take the same path as an allocation attempt with an empty free list.
  133. This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
  134. which performs the appropriate initialization checks.
  135. <P>
  136. In non-incremental mode, we make a decision about whether to garbage collect
  137. whenever an allocation would otherwise have failed with the current heap size.
  138. If the total amount of allocation since the last collection is less than
  139. the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
  140. expand the heap. Otherwise, we initiate a garbage collection. This ensures
  141. that the amount of garbage collection work per allocated byte remains
  142. constant.
  143. <P>
  144. The above is in fact an oversimplification of the real heap expansion
  145. and GC triggering heuristic, which adjusts slightly for root size
  146. and certain kinds of
  147. fragmentation. In particular:
  148. <UL>
  149. <LI> Programs with a large root set size and
  150. little live heap memory will expand the heap to amortize the cost of
  151. scanning the roots.
  152. <LI> Versions 5.x of the collector actually collect more frequently in
  153. nonincremental mode. The large block allocator usually refuses to split
  154. large heap blocks once the garbage collection threshold is
  155. reached. This often has the effect of collecting well before the
  156. heap fills up, thus reducing fragmentation and working set size at the
  157. expense of GC time. Versions 6.x choose an intermediate strategy depending
  158. on how much large object allocation has taken place in the past.
  159. (If the collector is configured to unmap unused pages, versions 6.x
  160. use the 5.x strategy.)
  161. <LI> In calculating the amount of allocation since the last collection we
  162. give partial credit for objects we expect to be explicitly deallocated.
  163. Even if all objects are explicitly managed, it is often desirable to collect
  164. on rare occasion, since that is our only mechanism for coalescing completely
  165. empty chunks.
  166. </ul>
  167. <P>
  168. It has been suggested that this should be adjusted so that we favor
  169. expansion if the resulting heap still fits into physical memory.
  170. In many cases, that would no doubt help. But it is tricky to do this
  171. in a way that remains robust if multiple application are contending
  172. for a single pool of physical memory.
  173. <H2>Mark phase</h2>
  174. At each collection, the collector marks all objects that are
  175. possibly reachable from pointer variables. Since it cannot generally
  176. tell where pointer variables are located, it scans the following
  177. <I>root segments</i> for pointers:
  178. <UL>
  179. <LI>The registers. Depending on the architecture, this may be done using
  180. assembly code, or by calling a <TT>setjmp</tt>-like function which saves
  181. register contents on the stack.
  182. <LI>The stack(s). In the case of a single-threaded application,
  183. on most platforms this
  184. is done by scanning the memory between (an approximation of) the current
  185. stack pointer and <TT>GC_stackbottom</tt>. (For Itanium, the register stack
  186. scanned separately.) The <TT>GC_stackbottom</tt> variable is set in
  187. a highly platform-specific way depending on the appropriate configuration
  188. information in <TT>gcconfig.h</tt>. Note that the currently active
  189. stack needs to be scanned carefully, since callee-save registers of
  190. client code may appear inside collector stack frames, which may
  191. change during the mark process. This is addressed by scanning
  192. some sections of the stack "eagerly", effectively capturing a snapshot
  193. at one point in time.
  194. <LI>Static data region(s). In the simplest case, this is the region
  195. between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in
  196. <TT>gcconfig.h</tt>. However, in most cases, this will also involve
  197. static data regions associated with dynamic libraries. These are
  198. identified by the mostly platform-specific code in <TT>dyn_load.c</tt>.
  199. </ul>
  200. The marker maintains an explicit stack of memory regions that are known
  201. to be accessible, but that have not yet been searched for contained pointers.
  202. Each stack entry contains the starting address of the block to be scanned,
  203. as well as a descriptor of the block. If no layout information is
  204. available for the block, then the descriptor is simply a length.
  205. (For other possibilities, see <TT>gc_mark.h</tt>.)
  206. <P>
  207. At the beginning of the mark phase, all root segments
  208. (as described above) are pushed on the
  209. stack by <TT>GC_push_roots</tt>. (Registers and eagerly processed
  210. stack sections are processed by pushing the referenced objects instead
  211. of the stack section itself.) If <TT>ALL_INTERIOR_PTRS</tt> is not
  212. defined, then stack roots require special treatment. In this case, the
  213. normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
  214. explicitly checks for interior pointers and pushes descriptors for target
  215. objects.
  216. <P>
  217. The marker is structured to allow incremental marking.
  218. Each call to <TT>GC_mark_some</tt> performs a small amount of
  219. work towards marking the heap.
  220. It maintains
  221. explicit state in the form of <TT>GC_mark_state</tt>, which
  222. identifies a particular sub-phase. Some other pieces of state, most
  223. notably the mark stack, identify how much work remains to be done
  224. in each sub-phase. The normal progression of mark states for
  225. a stop-the-world collection is:
  226. <OL>
  227. <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
  228. objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously
  229. be false, so the mark state is advanced to
  230. <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
  231. uncollectable objects, roots, and then mark everything reachable from them.
  232. <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
  233. objects are pushed, and objects reachable from them are marked.
  234. At that point, the next call to <TT>GC_mark_some</tt> calls
  235. <TT>GC_push_roots</tt> to push the roots. It the advances the
  236. mark state to
  237. <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
  238. empty, all reachable objects are marked. Once in this state, we work
  239. only on emptying the mark stack. Once this is completed, the state
  240. changes to
  241. <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
  242. </ol>
  243. The core mark routine <TT>GC_mark_from</tt>, is called
  244. repeatedly by several of the sub-phases when the mark stack starts to fill
  245. up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
  246. to empty the mark stack.
  247. The routine is designed to only perform a limited amount of marking at
  248. each call, so that it can also be used by the incremental collector.
  249. It is fairly carefully tuned, since it usually consumes a large majority
  250. of the garbage collection time.
  251. <P>
  252. The fact that it perform a only a small amount of work per call also
  253. allows it to be used as the core routine of the parallel marker. In that
  254. case it is normally invoked on thread-private mark stacks instead of the
  255. global mark stack. More details can be found in
  256. <A HREF="scale.html">scale.html</a>
  257. <P>
  258. The marker correctly handles mark stack overflows. Whenever the mark stack
  259. overflows, the mark state is reset to <TT>MS_INVALID</tt>.
  260. Since there are already marked objects in the heap,
  261. this eventually forces a complete
  262. scan of the heap, searching for pointers, during which any unmarked objects
  263. referenced by marked objects are again pushed on the mark stack. This
  264. process is repeated until the mark phase completes without a stack overflow.
  265. Each time the stack overflows, an attempt is made to grow the mark stack.
  266. All pieces of the collector that push regions onto the mark stack have to be
  267. careful to ensure forward progress, even in case of repeated mark stack
  268. overflows. Every mark attempt results in additional marked objects.
  269. <P>
  270. Each mark stack entry is processed by examining all candidate pointers
  271. in the range described by the entry. If the region has no associated
  272. type information, then this typically requires that each 4-byte aligned
  273. quantity (8-byte aligned with 64-bit pointers) be considered a candidate
  274. pointer.
  275. <P>
  276. We determine whether a candidate pointer is actually the address of
  277. a heap block. This is done in the following steps:
  278. <NL>
  279. <LI> The candidate pointer is checked against rough heap bounds.
  280. These heap bounds are maintained such that all actual heap objects
  281. fall between them. In order to facilitate black-listing (see below)
  282. we also include address regions that the heap is likely to expand into.
  283. Most non-pointers fail this initial test.
  284. <LI> The candidate pointer is divided into two pieces; the most significant
  285. bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
  286. the least significant bits specify an offset within that page.
  287. (A hardware page may actually consist of multiple such pages.
  288. HBLKSIZE is usually the page size divided by a small power of two.)
  289. <LI>
  290. The page address part of the candidate pointer is looked up in a
  291. <A HREF="tree.html">table</a>.
  292. Each table entry contains either 0, indicating that the page is not part
  293. of the garbage collected heap, a small integer <I>n</i>, indicating
  294. that the page is part of large object, starting at least <I>n</i> pages
  295. back, or a pointer to a descriptor for the page. In the first case,
  296. the candidate pointer i not a true pointer and can be safely ignored.
  297. In the last two cases, we can obtain a descriptor for the page containing
  298. the beginning of the object.
  299. <LI>
  300. The starting address of the referenced object is computed.
  301. The page descriptor contains the size of the object(s)
  302. in that page, the object kind, and the necessary mark bits for those
  303. objects. The size information can be used to map the candidate pointer
  304. to the object starting address. To accelerate this process, the page header
  305. also contains a pointer to a precomputed map of page offsets to displacements
  306. from the beginning of an object. The use of this map avoids a
  307. potentially slow integer remainder operation in computing the object
  308. start address.
  309. <LI>
  310. The mark bit for the target object is checked and set. If the object
  311. was previously unmarked, the object is pushed on the mark stack.
  312. The descriptor is read from the page descriptor. (This is computed
  313. from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
  314. </nl>
  315. <P>
  316. At the end of the mark phase, mark bits for left-over free lists are cleared,
  317. in case a free list was accidentally marked due to a stray pointer.
  318. <H2>Sweep phase</h2>
  319. At the end of the mark phase, all blocks in the heap are examined.
  320. Unmarked large objects are immediately returned to the large object free list.
  321. Each small object page is checked to see if all mark bits are clear.
  322. If so, the entire page is returned to the large object free list.
  323. Small object pages containing some reachable object are queued for later
  324. sweeping, unless we determine that the page contains very little free
  325. space, in which case it is not examined further.
  326. <P>
  327. This initial sweep pass touches only block headers, not
  328. the blocks themselves. Thus it does not require significant paging, even
  329. if large sections of the heap are not in physical memory.
  330. <P>
  331. Nonempty small object pages are swept when an allocation attempt
  332. encounters an empty free list for that object size and kind.
  333. Pages for the correct size and kind are repeatedly swept until at
  334. least one empty block is found. Sweeping such a page involves
  335. scanning the mark bit array in the page header, and building a free
  336. list linked through the first words in the objects themselves.
  337. This does involve touching the appropriate data page, but in most cases
  338. it will be touched only just before it is used for allocation.
  339. Hence any paging is essentially unavoidable.
  340. <P>
  341. Except in the case of pointer-free objects, we maintain the invariant
  342. that any object in a small object free list is cleared (except possibly
  343. for the link field). Thus it becomes the burden of the small object
  344. sweep routine to clear objects. This has the advantage that we can
  345. easily recover from accidentally marking a free list, though that could
  346. also be handled by other means. The collector currently spends a fair
  347. amount of time clearing objects, and this approach should probably be
  348. revisited.
  349. <P>
  350. In most configurations, we use specialized sweep routines to handle common
  351. small object sizes. Since we allocate one mark bit per word, it becomes
  352. easier to examine the relevant mark bits if the object size divides
  353. the word length evenly. We also suitably unroll the inner sweep loop
  354. in each case. (It is conceivable that profile-based procedure cloning
  355. in the compiler could make this unnecessary and counterproductive. I
  356. know of no existing compiler to which this applies.)
  357. <P>
  358. The sweeping of small object pages could be avoided completely at the expense
  359. of examining mark bits directly in the allocator. This would probably
  360. be more expensive, since each allocation call would have to reload
  361. a large amount of state (e.g. next object address to be swept, position
  362. in mark bit table) before it could do its work. The current scheme
  363. keeps the allocator simple and allows useful optimizations in the sweeper.
  364. <H2>Finalization</h2>
  365. Both <TT>GC_register_disappearing_link</tt> and
  366. <TT>GC_register_finalizer</tt> add the request to a corresponding hash
  367. table. The hash table is allocated out of collected memory, but
  368. the reference to the finalizable object is hidden from the collector.
  369. Currently finalization requests are processed non-incrementally at the
  370. end of a mark cycle.
  371. <P>
  372. The collector makes an initial pass over the table of finalizable objects,
  373. pushing the contents of unmarked objects onto the mark stack.
  374. After pushing each object, the marker is invoked to mark all objects
  375. reachable from it. The object itself is not explicitly marked.
  376. This assures that objects on which a finalizer depends are neither
  377. collected nor finalized.
  378. <P>
  379. If in the process of marking from an object the
  380. object itself becomes marked, we have uncovered
  381. a cycle involving the object. This usually results in a warning from the
  382. collector. Such objects are not finalized, since it may be
  383. unsafe to do so. See the more detailed
  384. <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
  385. <P>
  386. Any objects remaining unmarked at the end of this process are added to
  387. a queue of objects whose finalizers can be run. Depending on collector
  388. configuration, finalizers are dequeued and run either implicitly during
  389. allocation calls, or explicitly in response to a user request.
  390. (Note that the former is unfortunately both the default and not generally safe.
  391. If finalizers perform synchronization, it may result in deadlocks.
  392. Nontrivial finalizers generally need to perform synchronization, and
  393. thus require a different collector configuration.)
  394. <P>
  395. The collector provides a mechanism for replacing the procedure that is
  396. used to mark through objects. This is used both to provide support for
  397. Java-style unordered finalization, and to ignore certain kinds of cycles,
  398. <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
  399. <H2>Generational Collection and Dirty Bits</h2>
  400. We basically use the concurrent and generational GC algorithm described in
  401. <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
  402. by Boehm, Demers, and Shenker.
  403. <P>
  404. The most significant modification is that
  405. the collector always starts running in the allocating thread.
  406. There is no separate garbage collector thread. (If parallel GC is
  407. enabled, helper threads may also be woken up.)
  408. If an allocation attempt either requests a large object, or encounters
  409. an empty small object free list, and notices that there is a collection
  410. in progress, it immediately performs a small amount of marking work
  411. as described above.
  412. <P>
  413. This change was made both because we wanted to easily accommodate
  414. single-threaded environments, and because a separate GC thread requires
  415. very careful control over the scheduler to prevent the mutator from
  416. out-running the collector, and hence provoking unneeded heap growth.
  417. <P>
  418. In incremental mode, the heap is always expanded when we encounter
  419. insufficient space for an allocation. Garbage collection is triggered
  420. whenever we notice that more than
  421. <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
  422. bytes of allocation have taken place.
  423. After <TT>GC_full_freq</tt> minor collections a major collection
  424. is started.
  425. <P>
  426. All collections initially run interrupted until a predetermined
  427. amount of time (50 msecs by default) has expired. If this allows
  428. the collection to complete entirely, we can avoid correcting
  429. for data structure modifications during the collection. If it does
  430. not complete, we return control to the mutator, and perform small
  431. amounts of additional GC work during those later allocations that
  432. cannot be satisfied from small object free lists. When marking completes,
  433. the set of modified pages is retrieved, and we mark once again from
  434. marked objects on those pages, this time with the mutator stopped.
  435. <P>
  436. We keep track of modified pages using one of several distinct mechanisms:
  437. <OL>
  438. <LI>
  439. Through explicit mutator cooperation. Currently this requires
  440. the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
  441. <LI>
  442. (<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
  443. catching write faults. This is
  444. implemented for many Unix-like systems and for win32. It is not possible
  445. in a few environments.
  446. <LI>
  447. (<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
  448. (Currently only Sun's
  449. Solaris supports this. Though this is considerably cleaner, performance
  450. may actually be better with mprotect and signals.)
  451. <LI>
  452. (<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
  453. case the one in Xerox PCR.
  454. <LI>
  455. (<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if
  456. none of the other techniques is known to be usable, and
  457. <TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if
  458. the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
  459. </ol>
  460. <H2>Black-listing</h2>
  461. The collector implements <I>black-listing</i> of pages, as described
  462. in
  463. <A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/">
  464. Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
  465. <A HREF="papers/pldi93.ps.Z">here</a>.
  466. <P>
  467. During the mark phase, the collector tracks ``near misses'', i.e. attempts
  468. to follow a ``pointer'' to just outside the garbage-collected heap, or
  469. to a currently unallocated page inside the heap. Pages that have been
  470. the targets of such near misses are likely to be the targets of
  471. misidentified ``pointers'' in the future. To minimize the future
  472. damage caused by such misidentifications they will be allocated only to
  473. small pointerfree objects.
  474. <P>
  475. The collector understands two different kinds of black-listing. A
  476. page may be black listed for interior pointer references
  477. (<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
  478. miss from a location that requires interior pointer recognition,
  479. <I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
  480. is set. In this case, we also avoid allocating large blocks that include
  481. this page.
  482. <P>
  483. If the near miss came from a source that did not require interior
  484. pointer recognition, it is black-listed with
  485. <TT>GC_add_to_black_list_normal</tt>.
  486. A page black-listed in this way may appear inside a large object,
  487. so long as it is not the first page of a large object.
  488. <P>
  489. The <TT>GC_allochblk</tt> routine respects black-listing when assigning
  490. a block to a particular object kind and size. It occasionally
  491. drops (i.e. allocates and forgets) blocks that are completely black-listed
  492. in order to avoid excessively long large block free lists containing
  493. only unusable blocks. This would otherwise become an issue
  494. if there is low demand for small pointerfree objects.
  495. <H2>Thread support</h2>
  496. We support several different threading models. Unfortunately Pthreads,
  497. the only reasonably well standardized thread model, supports too narrow
  498. an interface for conservative garbage collection. There appears to be
  499. no completely portable way to allow the collector
  500. to coexist with various Pthreads
  501. implementations. Hence we currently support only the more
  502. common Pthreads implementations.
  503. <P>
  504. In particular, it is very difficult for the collector to stop all other
  505. threads in the system and examine the register contents. This is currently
  506. accomplished with very different mechanisms for some Pthreads
  507. implementations. The Solaris implementation temporarily disables much
  508. of the user-level threads implementation by stopping kernel-level threads
  509. ("lwp"s). The Linux/HPUX/OSF1 and Irix implementations sends signals to
  510. individual Pthreads and has them wait in the signal handler.
  511. <P>
  512. The Linux and Irix implementations use
  513. only documented Pthreads calls, but rely on extensions to their semantics.
  514. The Linux implementation <TT>linux_threads.c</tt> relies on only very
  515. mild extensions to the pthreads semantics, and already supports a large number
  516. of other Unix-like pthreads implementations. Our goal is to make this the
  517. only pthread support in the collector.
  518. <P>
  519. (The Irix implementation is separate only for historical reasons and should
  520. clearly be merged. The current Solaris implementation probably performs
  521. better in the uniprocessor case, but does not support thread operations in the
  522. collector. Hence it cannot support the parallel marker.)
  523. <P>
  524. All implementations must
  525. intercept thread creation and a few other thread-specific calls to allow
  526. enumeration of threads and location of thread stacks. This is current
  527. accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
  528. (really <TT>gc_pthread_redirects.h</tt>), or optionally
  529. by using ld's function call wrapping mechanism under Linux.
  530. <P>
  531. Recent versions of the collector support several facilites to enhance
  532. the processor-scalability and thread performance of the collector.
  533. These are discussed in more detail <A HREF="scale.html">here</a>.
  534. <P>
  535. Comments are appreciated. Please send mail to
  536. <A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or
  537. <A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a>
  538. <P>
  539. This is a modified copy of a page written while the author was at SGI.
  540. The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>.
  541. </body>
  542. </html>