X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=docs%2Fusers_guide%2Fruntime_control.xml;h=1af2c956ed71030dfb7e0ca8a5b73ae441ca1462;hb=fd2bd22ed4483f949e989b827013fcda695c803a;hp=8a7bafd922a5b4418e6301b0743dc59635bab90a;hpb=c74dd1f133703da84fdd8a513a3598fc74b67f0b;p=ghc-hetmet.git diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml index 8a7bafd..1af2c95 100644 --- a/docs/users_guide/runtime_control.xml +++ b/docs/users_guide/runtime_control.xml @@ -130,6 +130,35 @@ own signal handlers. + + + + RTS + option + + + WARNING: this option is for working around memory + allocation problems only. Do not use unless GHCi fails + with a message like “failed to mmap() memory below 2Gb”. If you need to use this option to get GHCi working + on your machine, please file a bug. + + + + On 64-bit machines, the RTS needs to allocate memory in the + low 2Gb of the address space. Support for this across + different operating systems is patchy, and sometimes fails. + This option is there to give the RTS a hint about where it + should be able to allocate memory in the low 2Gb of the + address space. For example, +RTS -xm20000000 + -RTS would hint that the RTS should allocate + starting at the 0.5Gb mark. The default is to use the OS's + built-in support for allocating memory in the low 2Gb if + available (e.g. mmap + with MAP_32BIT on Linux), or + otherwise -xm40000000. + + + @@ -269,38 +298,52 @@ - threads - RTS option + + RTS + option + + + [New in GHC 6.12.1] Disable the parallel GC. + The parallel GC is turned on automatically when parallel + execution is enabled with the option; + this option is available to turn it off if + necessary. + + Experiments have shown that parallel GC usually + results in a performance improvement given 3 cores or + more; with 2 cores it may or may not be beneficial, + depending on the workload. Bigger heaps work better with + parallel GC, so set your value high (3 + or more times the maximum residency). Look at the timing + stats with to see whether you're + getting any benefit from parallel GC or not. If you find + parallel GC is significantly slower + (in elapsed time) than sequential GC, please report it as + a bug. + + In GHC 6.10.1 it was possible to use a different + number of threads for GC than for execution, because the GC + used its own pool of threads. Now, the GC uses the same + threads as the mutator (for executing the program). + + + + + + + RTS + option - [Default: 1] [new in GHC 6.10] Set the number - of threads to use for garbage collection. This option is - only accepted when the program was linked with the - option; see . - - The garbage collector is able to work in parallel when - given more than one OS thread. Experiments have shown - that this usually results in a performance improvement - given 3 cores or more; with 2 cores it may or may not be - beneficial, depending on the workload. Bigger heaps work - better with parallel GC, so set your - value high (3 or more times the maximum residency). Look - at the timing stats with to - see whether you're getting any benefit from parallel GC or - not. If you find parallel GC is - significantly slower (in elapsed - time) than sequential GC, please report it as a - bug. - - This value is set automatically when the - option is used, so the only reason to - use would be if you wanted to use a - different number of threads for GC than for execution. - For example, if your program is strictly single-threaded - but you still want to benefit from parallel GC, then it - might make sense to use rather than - . + + [Default: 1] [New in GHC 6.12.1] + Enable the parallel GC only in + generation n and greater. + Parallel GC is often not worthwhile for collections in + generation 0 (the young generation), so it is enabled by + default only for collections in generation 1 (and higher, + if applicable). + @@ -448,6 +491,10 @@ file RTS option + + + RTS option + These options produce runtime-system statistics, such as the amount of time spent executing the program and in the @@ -514,6 +561,27 @@ + You can also get this in a more future-proof, machine readable + format, with -t --machine-readable: + + + + [("bytes allocated", "36169392") + ,("num_GCs", "69") + ,("average_bytes_used", "603392") + ,("max_bytes_used", "1065272") + ,("num_byte_usage_samples", "2") + ,("peak_megabytes_allocated", "3") + ,("init_cpu_seconds", "0.00") + ,("init_wall_seconds", "0.00") + ,("mutator_cpu_seconds", "0.02") + ,("mutator_wall_seconds", "0.02") + ,("GC_cpu_seconds", "0.07") + ,("GC_wall_seconds", "0.07") + ] + + + If you use the -s flag then, when your program finishes, you will see something like this (the exact details will vary depending on what sort of RTS you have, e.g. @@ -531,6 +599,8 @@ Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed + SPARKS: 359207 (557 converted, 149591 pruned) + INIT time 0.00s ( 0.00s elapsed) MUT time 0.01s ( 0.02s elapsed) GC time 0.07s ( 0.07s elapsed) @@ -589,6 +659,17 @@ + The SPARKS statistic refers to the + use of Control.Parallel.par and related + functionality in the program. Each spark represents a call + to par; a spark is "converted" when it is + executed in parallel; and a spark is "pruned" when it is + found to be already evaluated and is discarded from the pool + by the garbage collector. Any remaining sparks are + discarded at the end of execution, so "converted" plus + "pruned" does not necessarily add up to the total. + + Next there is the CPU time and wall clock time elapsedm broken down by what the runtiem system was doing at the time. @@ -939,18 +1020,137 @@ char *ghc_rts_opts = "-H128m -K1m"; itself. To do this, use the flag, e.g. $ ./a.out +RTS --info - [("GHC RTS", "Yes") + [("GHC RTS", "YES") ,("GHC version", "6.7") ,("RTS way", "rts_p") ,("Host platform", "x86_64-unknown-linux") + ,("Host architecture", "x86_64") + ,("Host OS", "linux") + ,("Host vendor", "unknown") ,("Build platform", "x86_64-unknown-linux") + ,("Build architecture", "x86_64") + ,("Build OS", "linux") + ,("Build vendor", "unknown") ,("Target platform", "x86_64-unknown-linux") + ,("Target architecture", "x86_64") + ,("Target OS", "linux") + ,("Target vendor", "unknown") + ,("Word size", "64") ,("Compiler unregisterised", "NO") ,("Tables next to code", "YES") ] The information is formatted such that it can be read as a - of type [(String, String)]. + of type [(String, String)]. Currently the following + fields are present: + + + + + GHC RTS + + Is this program linked against the GHC RTS? (always + "YES"). + + + + + GHC version + + The version of GHC used to compile this program. + + + + + RTS way + + The variant (“way”) of the runtime. The + most common values are rts (vanilla), + rts_thr (threaded runtime, i.e. linked using the + -threaded option) and rts_p + (profiling runtime, i.e. linked using the -prof + option). Other variants include debug + (linked using -debug), + t (ticky-ticky profiling) and + dyn (the RTS is + linked in dynamically, i.e. a shared library, rather than statically + linked into the executable itself). These can be combined, + e.g. you might have rts_thr_debug_p. + + + + + + Target platform, + Target architecture, + Target OS, + Target vendor + + + These are the platform the program is compiled to run on. + + + + + + Build platform, + Build architecture, + Build OS, + Build vendor + + + These are the platform where the program was built + on. (That is, the target platform of GHC itself.) Ordinarily + this is identical to the target platform. (It could potentially + be different if cross-compiling.) + + + + + + Host platform, + Host architecture + Host OS + Host vendor + + + These are the platform where GHC itself was compiled. + Again, this would normally be identical to the build and + target platforms. + + + + + Word size + + Either "32" or "64", + reflecting the word size of the target platform. + + + + + Compiler unregistered + + Was this program compiled with an “unregistered” + version of GHC? (I.e., a version of GHC that has no platform-specific + optimisations compiled in, usually because this is a currently + unsupported platform.) This value will usually be no, unless you're + using an experimental build of GHC. + + + + + Tables next to code + + Putting info tables directly next to entry code is a useful + performance optimisation that is not available on all platforms. + This field tells you whether the program has been compiled with + this optimisation. (Usually yes, except on unusual platforms.) + + + + +