X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=docs%2Fusers_guide%2Fruntime_control.xml;h=8a7bafd922a5b4418e6301b0743dc59635bab90a;hb=c74dd1f133703da84fdd8a513a3598fc74b67f0b;hp=776b65f9a1a58da65ccc51f0d4b12ee835a2b4ef;hpb=0dfcd5776f3ef89ceaafef6c4730ddac759e3716;p=ghc-hetmet.git diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml index 776b65f..8a7bafd 100644 --- a/docs/users_guide/runtime_control.xml +++ b/docs/users_guide/runtime_control.xml @@ -110,7 +110,7 @@ increase the resolution of the time profiler. Using a value of zero disables the RTS clock - completetly, and has the effect of disabling timers that + completely, and has the effect of disabling timers that depend on it: the context switch timer and the heap profiling timer. Context switches will still happen, but deterministically and at a rate much faster than normal. @@ -268,6 +268,43 @@ + + threads + RTS option + + + [Default: 1] [new in GHC 6.10] Set the number + of threads to use for garbage collection. This option is + only accepted when the program was linked with the + option; see . + + The garbage collector is able to work in parallel when + given more than one OS thread. Experiments have shown + that this usually results in a performance improvement + given 3 cores or more; with 2 cores it may or may not be + beneficial, depending on the workload. Bigger heaps work + better with parallel GC, so set your + value high (3 or more times the maximum residency). Look + at the timing stats with to + see whether you're getting any benefit from parallel GC or + not. If you find parallel GC is + significantly slower (in elapsed + time) than sequential GC, please report it as a + bug. + + This value is set automatically when the + option is used, so the only reason to + use would be if you wanted to use a + different number of threads for GC than for execution. + For example, if your program is strictly single-threaded + but you still want to benefit from parallel GC, then it + might make sense to use rather than + . + + + + size RTS option @@ -399,46 +436,245 @@ + + file + RTS option + - file + file RTS option - file + file RTS option - Write modest () or verbose - () garbage-collector statistics into file - file. The default - file is - program.stat. The - file stderr - is treated specially, with the output really being sent to - stderr. - - This option is useful for watching how the storage - manager adjusts the heap size based on the current amount of - live data. - - + These options produce runtime-system statistics, such + as the amount of time spent executing the program and in the + garbage collector, the amount of memory allocated, the + maximum size of the heap, and so on. The three + variants give different levels of detail: + produces a single line of output in the + same format as GHC's option, + produces a more detailed summary at the + end of the program, and additionally + produces information about each and every garbage + collection. + + The output is placed in + file. If + file is omitted, then the output + is sent to stderr. + + + If you use the -t flag then, when your + program finishes, you will see something like this: + + + +<<ghc: 36169392 bytes, 69 GCs, 603392/1065272 avg/max bytes residency (2 samples), 3M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02 elapsed), 0.07 GC (0.07 elapsed) :ghc>> + + + + This tells you: + + + + + + The total bytes allocated by the program. This may be less + than the peak memory use, as some may be freed. + + + + + The total number of garbage collections that occurred. + + + + + The average and maximum space used by your program. + This is only checked during major garbage collections, so it + is only an approximation; the number of samples tells you how + many times it is checked. + + + + + The peak memory the RTS has allocated from the OS. + + + + + The amount of CPU time and elapsed wall clock time while + initialising the runtime system (INIT), running the program + itself (MUT, the mutator), and garbage collecting (GC). + + + + + + If you use the -s flag then, when your + program finishes, you will see something like this (the exact + details will vary depending on what sort of RTS you have, e.g. + you will only see profiling data if your RTS is compiled for + profiling): + + + + 36,169,392 bytes allocated in the heap + 4,057,632 bytes copied during GC + 1,065,272 bytes maximum residency (2 sample(s)) + 54,312 bytes maximum slop + 3 MB total memory in use (0 MB lost due to fragmentation) + + Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed + Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed + + INIT time 0.00s ( 0.00s elapsed) + MUT time 0.01s ( 0.02s elapsed) + GC time 0.07s ( 0.07s elapsed) + EXIT time 0.00s ( 0.00s elapsed) + Total time 0.08s ( 0.09s elapsed) + + %GC time 89.5% (75.3% elapsed) + + Alloc rate 4,520,608,923 bytes per MUT second + + Productivity 10.5% of total user, 9.1% of total elapsed + + + + + + The "bytes allocated in the heap" is the total bytes allocated + by the program. This may be less than the peak memory use, as + some may be freed. + + + + + GHC uses a copying garbage collector. "bytes copied during GC" + tells you how many bytes it had to copy during garbage collection. + + + + + The maximum space actually used by your program is the + "bytes maximum residency" figure. This is only checked during + major garbage collections, so it is only an approximation; + the number of samples tells you how many times it is checked. + + + + + The "bytes maximum slop" tells you the most space that is ever + wasted due to the way GHC packs data into so-called "megablocks". + + + + + The "total memory in use" tells you the peak memory the RTS has + allocated from the OS. + + + + + Next there is information about the garbage collections done. + For each generation it says how many garbage collections were + done, how many of those collections used multiple threads, + the total CPU time used for garbage collecting that generation, + and the total wall clock time elapsed while garbage collecting + that generation. + + + + + Next there is the CPU time and wall clock time elapsedm broken + down by what the runtiem system was doing at the time. + INIT is the runtime system initialisation. + MUT is the mutator time, i.e. the time spent actually running + your code. + GC is the time spent doing garbage collection. + RP is the time spent doing retainer profiling. + PROF is the time spent doing other profiling. + EXIT is the runtime system shutdown time. + And finally, Total is, of course, the total. + + + %GC time tells you what percentage GC is of Total. + "Alloc rate" tells you the "bytes allocated in the heap" divided + by the MUT CPU time. + "Productivity" tells you what percentage of the Total CPU and wall + clock elapsed times are spent in the mutator (MUT). + + + + + + The -S flag, as well as giving the same + output as the -s flag, prints information + about each GC as it happens: + + + + Alloc Copied Live GC GC TOT TOT Page Flts + bytes bytes bytes user elap user elap + 528496 47728 141512 0.01 0.02 0.02 0.02 0 0 (Gen: 1) +[...] + 524944 175944 1726384 0.00 0.00 0.08 0.11 0 0 (Gen: 0) + + + + For each garbage collection, we print: + + + + + + How many bytes we allocated this garbage collection. + + + + + How many bytes we copied this garbage collection. + + + + + How many bytes are currently live. + + + + + How long this garbage collection took (CPU time and elapsed + wall clock time). + + + + + How long the program has been running (CPU time and elapsed + wall clock time). + + + + + How many page faults occured this garbage collection. + + + + + How many page faults occured since the end of the last garbage + collection. + + + + + Which generation is being garbage collected. + + + - - - - RTS option - - - Write a one-line GC stats summary after running the - program. This output is in the same format as that produced - by the option. - - As with , the default - file is - program.stat. The - file stderr - is treated specially, with the output really being sent to - stderr. @@ -446,14 +682,46 @@ - RTS options for profiling and parallelism + RTS options for concurrency and parallelism - The RTS options related to profiling are described in , those for concurrency in + The RTS options related to concurrency are described in , and those for parallelism in . + + RTS options for profiling + + Most profiling runtime options are only available when you + compile your program for profiling (see + , and + for the runtime options). + However, there is one profiling option that is available + for ordinary non-profiled executables: + + + + + + RTS + option + + + Generates a basic heap profile, in the + file prog.hp. + To produce the heap profile graph, + use hp2ps (see ). The basic heap profile is broken down by data + constructor, with other types of closures (functions, thunks, + etc.) grouped into broad categories + (e.g. FUN, THUNK). To + get a more detailed profile, use the full profiling + support (). + + + + + RTS options for hackers, debuggers, and over-interested souls