Simon Marlow [Wed, 16 Apr 2008 23:23:55 +0000 (23:23 +0000)]
update copyrights in rts/sm
Simon Marlow [Wed, 16 Apr 2008 23:22:32 +0000 (23:22 +0000)]
Reorganisation to fix problems related to the gct register variable
- GCAux.c contains code not compiled with the gct register enabled,
it is callable from outside the GC
- marking functions are moved to their relevant subsystems, outside
the GC
- mark_root needs to save the gct register, as it is called from
outside the GC
Simon Marlow [Wed, 16 Apr 2008 22:45:41 +0000 (22:45 +0000)]
faster block allocator, by dividing the free list into buckets
Simon Marlow [Wed, 16 Apr 2008 22:38:24 +0000 (22:38 +0000)]
allocate more blocks in one go, to reduce contention for the block allocator
Simon Marlow [Wed, 16 Apr 2008 22:25:39 +0000 (22:25 +0000)]
measure GC(0/1) times and work imbalance
Simon Marlow [Wed, 16 Apr 2008 22:23:19 +0000 (22:23 +0000)]
remove outdated comment
Simon Marlow [Wed, 16 Apr 2008 22:15:16 +0000 (22:15 +0000)]
calculate and report slop (wasted space at the end of blocks)
Simon Marlow [Wed, 16 Apr 2008 22:13:56 +0000 (22:13 +0000)]
free empty blocks at the end of GC
Simon Marlow [Wed, 16 Apr 2008 22:13:31 +0000 (22:13 +0000)]
move the scan block pointer into the gct structure
Simon Marlow [Wed, 16 Apr 2008 22:12:24 +0000 (22:12 +0000)]
improvements to +RTS -s output
- count and report number of parallel collections
- calculate bytes scanned in addition to bytes copied per thread
- calculate "work balance factor"
- tidy up the formatting a bit
Simon Marlow [Wed, 16 Apr 2008 22:10:02 +0000 (22:10 +0000)]
wait for threads to start up properly
Simon Marlow [Wed, 16 Apr 2008 22:08:07 +0000 (22:08 +0000)]
debug output tweaks
Simon Marlow [Wed, 16 Apr 2008 22:06:20 +0000 (22:06 +0000)]
Keep track of an accurate count of live words in each step
This means we can calculate slop easily, and also improve
predictability of GC.
Simon Marlow [Wed, 16 Apr 2008 22:03:47 +0000 (22:03 +0000)]
Allow work units smaller than a block to improve load balancing
Simon Marlow [Wed, 16 Apr 2008 22:01:04 +0000 (22:01 +0000)]
in scavenge_block1(), we can use the lock-free recordMutableGen()
Simon Marlow [Wed, 16 Apr 2008 21:59:45 +0000 (21:59 +0000)]
update the debug counters following changes to scav_find_work()
Simon Marlow [Wed, 16 Apr 2008 21:58:15 +0000 (21:58 +0000)]
change the find-work strategy: use oldest-first consistently
Simon Marlow [Wed, 16 Apr 2008 21:57:41 +0000 (21:57 +0000)]
per-thread debug output when using multiple threads, not just major gc
Simon Marlow [Wed, 16 Apr 2008 21:56:49 +0000 (21:56 +0000)]
small debug output improvements
Simon Marlow [Wed, 16 Apr 2008 21:55:03 +0000 (21:55 +0000)]
allow parallel minor collections too
Simon Marlow [Wed, 16 Apr 2008 21:54:05 +0000 (21:54 +0000)]
Specialise evac/scav for single-threaded, not minor, GC
So we can parallelise minor collections too. Sometimes it's worth it.
Simon Marlow [Wed, 16 Apr 2008 21:53:25 +0000 (21:53 +0000)]
move usleep(1) to gc_thread_work() from any_work()
Simon Marlow [Wed, 16 Apr 2008 21:52:45 +0000 (21:52 +0000)]
use RTS_VAR()
Simon Marlow [Wed, 16 Apr 2008 21:51:09 +0000 (21:51 +0000)]
treat the global work list as a queue rather than a stack
Simon Marlow [Wed, 16 Apr 2008 21:48:25 +0000 (21:48 +0000)]
GC: move static object processinng into thread-local storage
Simon Marlow [Wed, 16 Apr 2008 21:40:23 +0000 (21:40 +0000)]
tmp: usleep(1) during anyWork() if no work
Simon Marlow [Wed, 16 Apr 2008 21:39:45 +0000 (21:39 +0000)]
anyWork(): count the number of times we don't find any work
Simon Marlow [Wed, 16 Apr 2008 21:35:32 +0000 (21:35 +0000)]
stats fixes
Simon Marlow [Wed, 16 Apr 2008 21:35:04 +0000 (21:35 +0000)]
Add +RTS -vg flag for requesting some GC trace messages, outside DEBUG
DEBUG imposes a significant performance hit in the GC, yet we often
want some of the debugging output, so -vg gives us the cheap trace
messages without the sanity checking of DEBUG, just like -vs for the
scheduler.
Simon Marlow [Wed, 16 Apr 2008 21:34:36 +0000 (21:34 +0000)]
GC: rearrange storage to reduce memory accesses in the inner loop
Simon Marlow [Wed, 16 Apr 2008 21:33:58 +0000 (21:33 +0000)]
Add profiling of spinlocks
Simon Marlow [Wed, 16 Apr 2008 21:11:52 +0000 (21:11 +0000)]
rename StgSync to SpinLock
simonmar@microsoft.com [Thu, 28 Feb 2008 15:31:29 +0000 (15:31 +0000)]
Release some of the memory allocated to a stack when it shrinks (#2090)
When a stack is occupying less than 1/4 of the memory it owns, and is
larger than a megablock, we release half of it. Shrinking is O(1), it
doesn't need to copy the stack.
simonmar@microsoft.com [Thu, 28 Feb 2008 15:24:03 +0000 (15:24 +0000)]
scavengeTSO might encounter a ThreadRelocated; cope
simonmar@microsoft.com [Thu, 28 Feb 2008 15:23:32 +0000 (15:23 +0000)]
Updating a thunk in raiseAsync might encounter an IND; cope
There was already a check to avoid updating an IND, but it was
originally there to avoid a bug which doesn't exist now. Furthermore
the test and update are not atomic, so another thread could be
updating this thunk while we are. We have to just go ahead and update
anyway - it might waste a little work, but this is a very rare case.
Simon Marlow [Fri, 22 Feb 2008 14:20:08 +0000 (14:20 +0000)]
add GC(0) and GC(1) time
Simon Marlow [Wed, 20 Feb 2008 13:01:39 +0000 (13:01 +0000)]
round_to_mblocks: should use StgWord not nat
Simon Marlow [Tue, 19 Feb 2008 10:26:51 +0000 (10:26 +0000)]
debugging code
simonmar@microsoft.com [Mon, 18 Feb 2008 13:54:58 +0000 (13:54 +0000)]
refactoring
simonmar@microsoft.com [Fri, 15 Feb 2008 13:40:17 +0000 (13:40 +0000)]
fix off-by-one
simonmar@microsoft.com [Fri, 15 Feb 2008 13:38:50 +0000 (13:38 +0000)]
measure mut_elapsed_time
simonmar@microsoft.com [Fri, 15 Feb 2008 13:38:36 +0000 (13:38 +0000)]
fix build with 6.8
simonmar@microsoft.com [Fri, 15 Feb 2008 13:30:40 +0000 (13:30 +0000)]
add ROUNDUP_BYTES_TO_WDS
simonmar@microsoft.com [Thu, 31 Jan 2008 15:36:45 +0000 (15:36 +0000)]
Allow +RTS -H0 as a way to override a previous -H<size>
simonmar@microsoft.com [Wed, 30 Jan 2008 15:09:34 +0000 (15:09 +0000)]
comment out a bogus assertion
simonmar@microsoft.com [Wed, 30 Jan 2008 15:09:21 +0000 (15:09 +0000)]
memInventory: optionally dump the memory inventory
in addition to checking for leaks
simonmar@microsoft.com [Wed, 30 Jan 2008 15:07:30 +0000 (15:07 +0000)]
calcNeeded: fix the calculation, we weren't counting G0 step 1
simonmar@microsoft.com [Wed, 30 Jan 2008 13:54:18 +0000 (13:54 +0000)]
calcNeeded: add in the large blocks too
Simon Marlow [Wed, 30 Jan 2008 10:15:04 +0000 (10:15 +0000)]
update a comment
simonmar@microsoft.com [Wed, 30 Jan 2008 10:00:47 +0000 (10:00 +0000)]
tell Emacs these files are C
Simon Marlow [Fri, 18 Jan 2008 16:09:10 +0000 (16:09 +0000)]
fix an assertion
Simon Marlow [Wed, 16 Jan 2008 10:37:51 +0000 (10:37 +0000)]
cut-and-pasto
simonmar@microsoft.com [Tue, 15 Jan 2008 09:57:36 +0000 (09:57 +0000)]
small rearrangement
Simon Marlow [Fri, 11 Jan 2008 13:54:53 +0000 (13:54 +0000)]
recordMutableGen_GC: we must call the spinlocked version of allocBlock()
simonmar@microsoft.com [Fri, 11 Jan 2008 10:58:21 +0000 (10:58 +0000)]
remove unused declaration
Simon Marlow [Thu, 10 Jan 2008 12:28:20 +0000 (12:28 +0000)]
more fixes for THUNK_SELECTORs
simonmar@microsoft.com [Thu, 10 Jan 2008 10:56:28 +0000 (10:56 +0000)]
Fix bug in eval_thunk_selector()
Simon Marlow [Wed, 9 Jan 2008 16:28:28 +0000 (16:28 +0000)]
move markSparkQueue into GC.c, as it needs the register variable defined
Simon Marlow [Wed, 9 Jan 2008 16:27:32 +0000 (16:27 +0000)]
Windows fix
Simon Marlow [Wed, 9 Jan 2008 14:49:37 +0000 (14:49 +0000)]
Fix bug: eval_thunk_selector was calling the unlocked evacuate()
simonmar@microsoft.com [Mon, 7 Jan 2008 13:48:38 +0000 (13:48 +0000)]
add GC elapsed time
simonmar@microsoft.com [Thu, 20 Dec 2007 14:58:55 +0000 (14:58 +0000)]
update to match Mb -> MB change in -s output
simonmar@microsoft.com [Tue, 18 Dec 2007 14:51:35 +0000 (14:51 +0000)]
use "MB" rather than "Mb" for abbreviating megabytes
simonmar@microsoft.com [Fri, 14 Dec 2007 13:59:09 +0000 (13:59 +0000)]
findSlop: useful function for tracking down excessive slop in gdb
simonmar@microsoft.com [Fri, 14 Dec 2007 13:58:42 +0000 (13:58 +0000)]
calculate wastage due to unused memory at the end of each block
simonmar@microsoft.com [Fri, 14 Dec 2007 10:32:23 +0000 (10:32 +0000)]
bugfix: check for NULL before testing isPartiallyFull(stp->blocks)
simonmar@microsoft.com [Thu, 13 Dec 2007 16:50:13 +0000 (16:50 +0000)]
have each GC thread call GetRoots()
simonmar@microsoft.com [Thu, 13 Dec 2007 16:45:25 +0000 (16:45 +0000)]
use synchronised version of freeChain() in scavenge_mutable_list()
simonmar@microsoft.com [Thu, 13 Dec 2007 15:09:46 +0000 (15:09 +0000)]
remove declarations for variables that no longer exist
simonmar@microsoft.com [Wed, 12 Dec 2007 16:33:29 +0000 (16:33 +0000)]
remove old comment
simonmar@microsoft.com [Thu, 29 Nov 2007 15:49:27 +0000 (15:49 +0000)]
GC: small improvement to parallelism
don't cache a work block locally if the global queue is empty
simonmar@microsoft.com [Thu, 29 Nov 2007 12:00:21 +0000 (12:00 +0000)]
EVACUATED: target is definitely HEAP_ALLOCED(), no need to check
simonmar@microsoft.com [Tue, 27 Nov 2007 16:07:47 +0000 (16:07 +0000)]
in scavenge_block(), keep going if we're scanning the todo block
simonmar@microsoft.com [Tue, 27 Nov 2007 16:07:17 +0000 (16:07 +0000)]
count the number of todo blocks, and add a trace
simonmar@microsoft.com [Fri, 23 Nov 2007 16:25:22 +0000 (16:25 +0000)]
oops, restore accidentally disabled hash-consing for Char
simonmar@microsoft.com [Thu, 22 Nov 2007 12:23:27 +0000 (12:23 +0000)]
kill the PAR/GRAN debug flags
simonmar@microsoft.com [Thu, 22 Nov 2007 10:50:24 +0000 (10:50 +0000)]
stats: print elapsed time for GC in each generation
simonmar@microsoft.com [Wed, 21 Nov 2007 16:47:36 +0000 (16:47 +0000)]
assertion fix
Simon Marlow [Wed, 21 Nov 2007 15:58:51 +0000 (15:58 +0000)]
cache bd->todo_bd->free and the limit in the workspace
avoids cache contention: bd->todo_bd->free may clash with any cache
line, so we localise it.
simonmar@microsoft.com [Wed, 21 Nov 2007 16:47:47 +0000 (16:47 +0000)]
warning fix
simonmar@microsoft.com [Tue, 20 Nov 2007 13:38:35 +0000 (13:38 +0000)]
fix boundary bugs in a couple of for-loops
simonmar@microsoft.com [Tue, 20 Nov 2007 13:36:35 +0000 (13:36 +0000)]
improvements to PAPI support
- major (multithreaded) GC is measured separately from minor GC
- events to measure can now be specified on the command line, e.g
prog +RTS -a+PAPI_TOT_CYC
simonmar@microsoft.com [Mon, 19 Nov 2007 11:16:30 +0000 (11:16 +0000)]
use SRC_CC_OPTS rather than SRC_HC_OPTS for C options
Simon Marlow [Thu, 1 Nov 2007 15:03:25 +0000 (15:03 +0000)]
allow PAPI to be installed somewhere non-standard
Simon Marlow [Thu, 1 Nov 2007 15:02:58 +0000 (15:02 +0000)]
fix warnings
Simon Marlow [Thu, 1 Nov 2007 15:02:28 +0000 (15:02 +0000)]
fix a warning
Simon Marlow [Thu, 1 Nov 2007 15:02:00 +0000 (15:02 +0000)]
fix a warning
Simon Marlow [Wed, 31 Oct 2007 16:31:47 +0000 (16:31 +0000)]
rename n_threads to n_gc_threads
Simon Marlow [Wed, 31 Oct 2007 16:30:15 +0000 (16:30 +0000)]
Refactor PAPI support, and add profiling of multithreaded GC
Simon Marlow [Wed, 31 Oct 2007 15:38:39 +0000 (15:38 +0000)]
fix merge errors
Simon Marlow [Wed, 31 Oct 2007 15:34:17 +0000 (15:34 +0000)]
refactoring of eager_promotion in scavenge_block()
Simon Marlow [Wed, 31 Oct 2007 15:33:39 +0000 (15:33 +0000)]
compile special minor GC versions of evacuate() and scavenge_block()
This is for two reasons: minor GCs don't need to do per-object locking
for parallel GC, which is fairly expensive, and secondly minor GCs
don't need to follow SRTs.
Simon Marlow [Wed, 31 Oct 2007 15:32:52 +0000 (15:32 +0000)]
fixes for eval_thunk_selector() in parallel GC
Simon Marlow [Wed, 31 Oct 2007 14:45:42 +0000 (14:45 +0000)]
Remove the optimisation of avoiding scavenging for certain objects
Some objects don't need to be scavenged, in particular if they have no
pointers. This seems like an obvious optimisation, but in fact it
only accounts for about 1% of objects (in GHC, for example), and the
extra complication means it probably isn't worth doing.
Simon Marlow [Wed, 31 Oct 2007 14:42:30 +0000 (14:42 +0000)]
GC refactoring: change evac_gen to evac_step
By establishing an ordering on step pointers, we can simplify the test
(stp->gen_no < evac_gen)
to
(stp < evac_step)
which is common in evacuate().
Simon Marlow [Wed, 31 Oct 2007 14:36:34 +0000 (14:36 +0000)]
GC refactoring: make evacuate() take an StgClosure**
Change the type of evacuate() from
StgClosure *evacuate(StgClosure *);
to
void evacuate(StgClosure **);
So evacuate() itself writes the source pointer, rather than the
caller. This is slightly cleaner, and avoids a few memory writes:
sometimes evacuate() doesn't move the object, and in these cases the
source pointer doesn't need to be written. It doesn't have a
measurable impact on performance, though.
Simon Marlow [Wed, 31 Oct 2007 13:09:35 +0000 (13:09 +0000)]
tiny optimisation in evacuate()
Simon Marlow [Wed, 31 Oct 2007 13:07:18 +0000 (13:07 +0000)]
Initial parallel GC support
eg. use +RTS -g2 -RTS for 2 threads. Only major GCs are parallelised,
minor GCs are still sequential. Don't use more threads than you
have CPUs.
It works most of the time, although you won't see much speedup yet.
Tuning and more work on stability still required.
Simon Marlow [Wed, 31 Oct 2007 12:51:36 +0000 (12:51 +0000)]
Refactoring of the GC in preparation for parallel GC
This patch localises the state of the GC into a gc_thread structure,
and reorganises the inner loop of the GC to scavenge one block at a
time from global work lists in each "step". The gc_thread structure
has a "workspace" for each step, in which it collects evacuated
objects until it has a full block to push out to the step's global
list. Details of the algorithm will be on the wiki in due course.
At the moment, THREADED_RTS does not compile, but the single-threaded
GC works (and is 10-20% slower than before).
Simon Marlow [Tue, 30 Oct 2007 14:45:09 +0000 (14:45 +0000)]
also count total dispatch stalls in +RTS -as