Keep the remembered sets local to each thread during parallel GC
authorSimon Marlow <marlowsd@gmail.com>
Mon, 12 Jan 2009 12:10:24 +0000 (12:10 +0000)
committerSimon Marlow <marlowsd@gmail.com>
Mon, 12 Jan 2009 12:10:24 +0000 (12:10 +0000)
commit6a405b1efd138a4af4ed93ce4ff173a4c5704512
treed11e6ba4cb32b3c447065b0e928e245d6639058d
parent192c7d555448b8a78d57a5c01c0c20f642f2d0f3
Keep the remembered sets local to each thread during parallel GC
This turns out to be quite vital for parallel programs:

  - The way we discover which threads to traverse is by finding
    dirty threads via the remembered sets (aka mutable lists).

  - A dirty thread will be on the remembered set of the capability
    that was running it, and we really want to traverse that thread's
    stack using the GC thread for the capability, because it is in
    that CPU's cache.  If we get this wrong, we get penalised badly by
    the memory system.

Previously we had per-capability mutable lists but they were
aggregated before GC and traversed by just one of the GC threads.
This resulted in very poor performance particularly for parallel
programs with deep stacks.

Now we keep per-capability remembered sets throughout GC, which also
removes a lock (recordMutableGen_sync).
12 files changed:
includes/Storage.h
rts/Capability.c
rts/Capability.h
rts/Stats.c
rts/Updates.h
rts/sm/Compact.c
rts/sm/GC.c
rts/sm/GCThread.h
rts/sm/GCUtils.h
rts/sm/Scav.c
rts/sm/Scav.h
rts/sm/Storage.c