X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fusers_guide%2Fruntime_control.xml;h=14732c57c3aa52da7a6e812ebbae65b12b130a8f;hp=776b65f9a1a58da65ccc51f0d4b12ee835a2b4ef;hb=63dd4db64df2949448ceef1adb3c885c3ebb03b9;hpb=0dfcd5776f3ef89ceaafef6c4730ddac759e3716 diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml index 776b65f..14732c5 100644 --- a/docs/users_guide/runtime_control.xml +++ b/docs/users_guide/runtime_control.xml @@ -10,7 +10,8 @@ code and then links it with a non-trivial runtime system (RTS), which handles storage management, profiling, etc. - You have some control over the behaviour of the RTS, by giving + If you use the -rtsopts flag when linking, + you have some control over the behaviour of the RTS, by giving special command-line arguments to your program. When your Haskell program starts up, its RTS extracts @@ -68,7 +69,8 @@ environment variablefor setting RTS options - RTS options are also taken from the environment variable + When the -rtsopts flag is used when linking, + RTS options are also taken from the environment variable GHCRTSGHCRTS . For example, to set the maximum heap size to 128M for all GHC-compiled programs (using an @@ -110,7 +112,7 @@ increase the resolution of the time profiler. Using a value of zero disables the RTS clock - completetly, and has the effect of disabling timers that + completely, and has the effect of disabling timers that depend on it: the context switch timer and the heap profiling timer. Context switches will still happen, but deterministically and at a rate much faster than normal. @@ -130,6 +132,35 @@ own signal handlers. + + + + RTS + option + + + WARNING: this option is for working around memory + allocation problems only. Do not use unless GHCi fails + with a message like “failed to mmap() memory below 2Gb”. If you need to use this option to get GHCi working + on your machine, please file a bug. + + + + On 64-bit machines, the RTS needs to allocate memory in the + low 2Gb of the address space. Support for this across + different operating systems is patchy, and sometimes fails. + This option is there to give the RTS a hint about where it + should be able to allocate memory in the low 2Gb of the + address space. For example, +RTS -xm20000000 + -RTS would hint that the RTS should allocate + starting at the 0.5Gb mark. The default is to use the OS's + built-in support for allocating memory in the low 2Gb if + available (e.g. mmap + with MAP_32BIT on Linux), or + otherwise -xm40000000. + + + @@ -153,7 +184,7 @@ allocation area, size - [Default: 256k] Set the allocation area size + [Default: 512k] Set the allocation area size used by the garbage collector. The allocation area (actually generation 0 step 0) is fixed and is never resized (unless you use , below). @@ -268,6 +299,64 @@ + + + RTS + option + + + [New in GHC 6.12.1] [Default: 0] + Use parallel GC in + generation gen and higher. + Omitting gen turns off the + parallel GC completely, reverting to sequential GC. + + The default parallel GC settings are usually suitable + for parallel programs (i.e. those + using par, Strategies, or with multiple + threads). However, it is sometimes beneficial to enable + the parallel GC for a single-threaded sequential program + too, especially if the program has a large amount of heap + data and GC is a significant fraction of runtime. To use + the parallel GC in a sequential program, enable the + parallel runtime with a suitable -N + option, and additionally it might be beneficial to + restrict parallel GC to the old generation + with -qg1. + + + + + + + RTS + option + + + + [New in GHC 6.12.1] [Default: 1] Use + load-balancing in the parallel GC in + generation gen and higher. + Omitting gen disables + load-balancing entirely. + + + Load-balancing shares out the work of GC between the + available cores. This is a good idea when the heap is + large and we need to parallelise the GC work, however it + is also pessimal for the short young-generation + collections in a parallel program, because it can harm + locality by moving data from the cache of the CPU where is + it being used to the cache of another CPU. Hence the + default is to do load-balancing only in the + old-generation. In fact, for a parallel program it is + sometimes beneficial to disable load-balancing entirely + with -qb. + + + + + size RTS option @@ -399,46 +488,290 @@ + + file + RTS option + - file + file RTS option - file + file RTS option - - Write modest () or verbose - () garbage-collector statistics into file - file. The default - file is - program.stat. The - file stderr - is treated specially, with the output really being sent to - stderr. - - This option is useful for watching how the storage - manager adjusts the heap size based on the current amount of - live data. - - - - - - RTS option + + RTS option - Write a one-line GC stats summary after running the - program. This output is in the same format as that produced - by the option. - - As with , the default - file is - program.stat. The - file stderr - is treated specially, with the output really being sent to - stderr. + These options produce runtime-system statistics, such + as the amount of time spent executing the program and in the + garbage collector, the amount of memory allocated, the + maximum size of the heap, and so on. The three + variants give different levels of detail: + produces a single line of output in the + same format as GHC's option, + produces a more detailed summary at the + end of the program, and additionally + produces information about each and every garbage + collection. + + The output is placed in + file. If + file is omitted, then the output + is sent to stderr. + + + If you use the -t flag then, when your + program finishes, you will see something like this: + + + +<<ghc: 36169392 bytes, 69 GCs, 603392/1065272 avg/max bytes residency (2 samples), 3M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02 elapsed), 0.07 GC (0.07 elapsed) :ghc>> + + + + This tells you: + + + + + + The total number of bytes allocated by the program over the + whole run. + + + + + The total number of garbage collections performed. + + + + + The average and maximum "residency", which is the amount of + live data in bytes. The runtime can only determine the + amount of live data during a major GC, which is why the + number of samples corresponds to the number of major GCs + (and is usually relatively small). To get a better picture + of the heap profile of your program, use + the RTS option + (). + + + + + The peak memory the RTS has allocated from the OS. + + + + + The amount of CPU time and elapsed wall clock time while + initialising the runtime system (INIT), running the program + itself (MUT, the mutator), and garbage collecting (GC). + + + + + + You can also get this in a more future-proof, machine readable + format, with -t --machine-readable: + + + + [("bytes allocated", "36169392") + ,("num_GCs", "69") + ,("average_bytes_used", "603392") + ,("max_bytes_used", "1065272") + ,("num_byte_usage_samples", "2") + ,("peak_megabytes_allocated", "3") + ,("init_cpu_seconds", "0.00") + ,("init_wall_seconds", "0.00") + ,("mutator_cpu_seconds", "0.02") + ,("mutator_wall_seconds", "0.02") + ,("GC_cpu_seconds", "0.07") + ,("GC_wall_seconds", "0.07") + ] + + + + If you use the -s flag then, when your + program finishes, you will see something like this (the exact + details will vary depending on what sort of RTS you have, e.g. + you will only see profiling data if your RTS is compiled for + profiling): + + + + 36,169,392 bytes allocated in the heap + 4,057,632 bytes copied during GC + 1,065,272 bytes maximum residency (2 sample(s)) + 54,312 bytes maximum slop + 3 MB total memory in use (0 MB lost due to fragmentation) + + Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed + Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed + + SPARKS: 359207 (557 converted, 149591 pruned) + + INIT time 0.00s ( 0.00s elapsed) + MUT time 0.01s ( 0.02s elapsed) + GC time 0.07s ( 0.07s elapsed) + EXIT time 0.00s ( 0.00s elapsed) + Total time 0.08s ( 0.09s elapsed) + + %GC time 89.5% (75.3% elapsed) + + Alloc rate 4,520,608,923 bytes per MUT second + + Productivity 10.5% of total user, 9.1% of total elapsed + + + + + + The "bytes allocated in the heap" is the total bytes allocated + by the program over the whole run. + + + + + GHC uses a copying garbage collector by default. "bytes copied + during GC" tells you how many bytes it had to copy during + garbage collection. + + + + + The maximum space actually used by your program is the + "bytes maximum residency" figure. This is only checked during + major garbage collections, so it is only an approximation; + the number of samples tells you how many times it is checked. + + + + + The "bytes maximum slop" tells you the most space that is ever + wasted due to the way GHC allocates memory in blocks. Slop is + memory at the end of a block that was wasted. There's no way + to control this; we just like to see how much memory is being + lost this way. + + + + + The "total memory in use" tells you the peak memory the RTS has + allocated from the OS. + + + + + Next there is information about the garbage collections done. + For each generation it says how many garbage collections were + done, how many of those collections were done in parallel, + the total CPU time used for garbage collecting that generation, + and the total wall clock time elapsed while garbage collecting + that generation. + + + + The SPARKS statistic refers to the + use of Control.Parallel.par and related + functionality in the program. Each spark represents a call + to par; a spark is "converted" when it is + executed in parallel; and a spark is "pruned" when it is + found to be already evaluated and is discarded from the pool + by the garbage collector. Any remaining sparks are + discarded at the end of execution, so "converted" plus + "pruned" does not necessarily add up to the total. + + + + Next there is the CPU time and wall clock time elapsed broken + down by what the runtime system was doing at the time. + INIT is the runtime system initialisation. + MUT is the mutator time, i.e. the time spent actually running + your code. + GC is the time spent doing garbage collection. + RP is the time spent doing retainer profiling. + PROF is the time spent doing other profiling. + EXIT is the runtime system shutdown time. + And finally, Total is, of course, the total. + + + %GC time tells you what percentage GC is of Total. + "Alloc rate" tells you the "bytes allocated in the heap" divided + by the MUT CPU time. + "Productivity" tells you what percentage of the Total CPU and wall + clock elapsed times are spent in the mutator (MUT). + + + + + + The -S flag, as well as giving the same + output as the -s flag, prints information + about each GC as it happens: + + + + Alloc Copied Live GC GC TOT TOT Page Flts + bytes bytes bytes user elap user elap + 528496 47728 141512 0.01 0.02 0.02 0.02 0 0 (Gen: 1) +[...] + 524944 175944 1726384 0.00 0.00 0.08 0.11 0 0 (Gen: 0) + + + + For each garbage collection, we print: + + + + + + How many bytes we allocated this garbage collection. + + + + + How many bytes we copied this garbage collection. + + + + + How many bytes are currently live. + + + + + How long this garbage collection took (CPU time and elapsed + wall clock time). + + + + + How long the program has been running (CPU time and elapsed + wall clock time). + + + + + How many page faults occured this garbage collection. + + + + + How many page faults occured since the end of the last garbage + collection. + + + + + Which generation is being garbage collected. + + + + @@ -446,14 +779,139 @@ - RTS options for profiling and parallelism + RTS options for concurrency and parallelism - The RTS options related to profiling are described in , those for concurrency in + The RTS options related to concurrency are described in , and those for parallelism in . + + RTS options for profiling + + Most profiling runtime options are only available when you + compile your program for profiling (see + , and + for the runtime options). + However, there is one profiling option that is available + for ordinary non-profiled executables: + + + + + + RTS + option + + + Generates a basic heap profile, in the + file prog.hp. + To produce the heap profile graph, + use hp2ps (see ). The basic heap profile is broken down by data + constructor, with other types of closures (functions, thunks, + etc.) grouped into broad categories + (e.g. FUN, THUNK). To + get a more detailed profile, use the full profiling + support (). + + + + + + + Tracing + + tracing + events + eventlog files + + + When the program is linked with the + option (), runtime events can + be logged in two ways: + + + + + + In binary format to a file for later analysis by a + variety of tools. One such tool + is ThreadScopeThreadScope, + which interprets the event log to produce a visual parallel + execution profile of the program. + + + + + As text to standard output, for debugging purposes. + + + + + + + + + RTS option + + + + Log events in binary format to the + file program.eventlog, + where flags is a sequence of + zero or more characters indicating which kinds of events + to log. Currently there is only one type + supported: -ls, for scheduler events. + + + + The format of the log file is described by the header + EventLogFormat.h that comes with + GHC, and it can be parsed in Haskell using + the ghc-events + library. To dump the contents of + a .eventlog file as text, use the + tool show-ghc-events that comes with + the ghc-events + package. + + + + + + + flags + RTS option + + + + Log events as text to standard output, instead of to + the .eventlog file. + The flags are the same as + for , with the additional + option t which indicates that the + each event printed should be preceded by a timestamp value + (in the binary .eventlog file, all + events are automatically associated with a timestamp). + + + + + + + + The debugging + options also + generate events which are logged using the tracing framework. + By default those events are dumped as text to stdout + ( + implies ), but they may instead be stored in + the binary eventlog file by using the + option. + + + RTS options for hackers, debuggers, and over-interested souls @@ -490,14 +948,28 @@ - num + x -DRTS option - An RTS debugging flag; varying quantities of output - depending on which bits are set in - num. Only works if the RTS was - compiled with the option. + + An RTS debugging flag; only availble if the program was + linked with the option. Various + values of x are provided to + enable debug messages and additional runtime sanity checks + in different subsystems in the RTS, for + example +RTS -Ds -RTS enables debug + messages from the scheduler. + Use +RTS -? to find out which + debug flags are supported. + + + + Debug messages will be sent to the binary event log file + instead of stdout if the option is + added. This might be useful for reducing the overhead of + debug tracing. + @@ -510,20 +982,13 @@ Produce “ticky-ticky” statistics at the - end of the program run. The file - business works just like on the RTS - option (above). - - “Ticky-ticky” statistics are counts of - various program actions (updates, enters, etc.) The program - must have been compiled using - - (a.k.a. “ticky-ticky profiling”), and, for it to - be really useful, linked with suitable system libraries. - Not a trivial undertaking: consult the installation guide on - how to set things up for easy “ticky-ticky” - profiling. For more information, see . + end of the program run (only available if the program was + linked with ). + The file business works just like + on the RTS option, above. + + For more information on ticky-ticky profiling, see + . @@ -582,6 +1047,20 @@ + + Linker flags to change RTS behaviour + + RTS behaviour, changing + + + GHC lets you exercise rudimentary control over the RTS settings + for any given program, by using the -with-rtsopts + linker flag. For example, to set -H128m -K1m, + link with -with-rtsopts="-H128m -K1m". + + + + “Hooks” to change RTS behaviour @@ -671,18 +1150,137 @@ char *ghc_rts_opts = "-H128m -K1m"; itself. To do this, use the flag, e.g. $ ./a.out +RTS --info - [("GHC RTS", "Yes") + [("GHC RTS", "YES") ,("GHC version", "6.7") ,("RTS way", "rts_p") ,("Host platform", "x86_64-unknown-linux") + ,("Host architecture", "x86_64") + ,("Host OS", "linux") + ,("Host vendor", "unknown") ,("Build platform", "x86_64-unknown-linux") + ,("Build architecture", "x86_64") + ,("Build OS", "linux") + ,("Build vendor", "unknown") ,("Target platform", "x86_64-unknown-linux") + ,("Target architecture", "x86_64") + ,("Target OS", "linux") + ,("Target vendor", "unknown") + ,("Word size", "64") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ] The information is formatted such that it can be read as a - of type [(String, String)]. + of type [(String, String)]. Currently the following + fields are present: + + + + + GHC RTS + + Is this program linked against the GHC RTS? (always + "YES"). + + + + + GHC version + + The version of GHC used to compile this program. + + + + + RTS way + + The variant (“way”) of the runtime. The + most common values are rts (vanilla), + rts_thr (threaded runtime, i.e. linked using the + -threaded option) and rts_p + (profiling runtime, i.e. linked using the -prof + option). Other variants include debug + (linked using -debug), + t (ticky-ticky profiling) and + dyn (the RTS is + linked in dynamically, i.e. a shared library, rather than statically + linked into the executable itself). These can be combined, + e.g. you might have rts_thr_debug_p. + + + + + + Target platform, + Target architecture, + Target OS, + Target vendor + + + These are the platform the program is compiled to run on. + + + + + + Build platform, + Build architecture, + Build OS, + Build vendor + + + These are the platform where the program was built + on. (That is, the target platform of GHC itself.) Ordinarily + this is identical to the target platform. (It could potentially + be different if cross-compiling.) + + + + + + Host platform, + Host architecture + Host OS + Host vendor + + + These are the platform where GHC itself was compiled. + Again, this would normally be identical to the build and + target platforms. + + + + + Word size + + Either "32" or "64", + reflecting the word size of the target platform. + + + + + Compiler unregistered + + Was this program compiled with an “unregistered” + version of GHC? (I.e., a version of GHC that has no platform-specific + optimisations compiled in, usually because this is a currently + unsupported platform.) This value will usually be no, unless you're + using an experimental build of GHC. + + + + + Tables next to code + + Putting info tables directly next to entry code is a useful + performance optimisation that is not available on all platforms. + This field tells you whether the program has been compiled with + this optimisation. (Usually yes, except on unusual platforms.) + + + + +