X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=docs%2Fusers_guide%2Fruntime_control.xml;h=c482a28a91d3f3d4dfb6b6089681176015e4dd99;hb=26f164e5759e9eca73deb0531ddec422d36a6924;hp=776b65f9a1a58da65ccc51f0d4b12ee835a2b4ef;hpb=0dfcd5776f3ef89ceaafef6c4730ddac759e3716;p=ghc-hetmet.git
diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml
index 776b65f..c482a28 100644
--- a/docs/users_guide/runtime_control.xml
+++ b/docs/users_guide/runtime_control.xml
@@ -110,7 +110,7 @@
increase the resolution of the time profiler.
Using a value of zero disables the RTS clock
- completetly, and has the effect of disabling timers that
+ completely, and has the effect of disabling timers that
depend on it: the context switch timer and the heap profiling
timer. Context switches will still happen, but
deterministically and at a rate much faster than normal.
@@ -130,6 +130,35 @@
own signal handlers.
+
+
+
+ RTS
+ option
+
+
+ WARNING: this option is for working around memory
+ allocation problems only. Do not use unless GHCi fails
+ with a message like “failed to mmap() memory below 2Gb”. If you need to use this option to get GHCi working
+ on your machine, please file a bug.
+
+
+
+ On 64-bit machines, the RTS needs to allocate memory in the
+ low 2Gb of the address space. Support for this across
+ different operating systems is patchy, and sometimes fails.
+ This option is there to give the RTS a hint about where it
+ should be able to allocate memory in the low 2Gb of the
+ address space. For example, +RTS -xm20000000
+ -RTS would hint that the RTS should allocate
+ starting at the 0.5Gb mark. The default is to use the OS's
+ built-in support for allocating memory in the low 2Gb if
+ available (e.g. mmap
+ with MAP_32BIT on Linux), or
+ otherwise -xm40000000.
+
+
+
@@ -153,7 +182,7 @@
allocation area, size
- [Default: 256k] Set the allocation area size
+ [Default: 512k] Set the allocation area size
used by the garbage collector. The allocation area
(actually generation 0 step 0) is fixed and is never resized
(unless you use , below).
@@ -268,6 +297,64 @@
+
+
+ RTS
+ option
+
+
+ [New in GHC 6.12.1] [Default: 0]
+ Use parallel GC in
+ generation gen and higher.
+ Omitting gen turns off the
+ parallel GC completely, reverting to sequential GC.
+
+ The default parallel GC settings are usually suitable
+ for parallel programs (i.e. those
+ using par, Strategies, or with multiple
+ threads). However, it is sometimes beneficial to enable
+ the parallel GC for a single-threaded sequential program
+ too, especially if the program has a large amount of heap
+ data and GC is a significant fraction of runtime. To use
+ the parallel GC in a sequential program, enable the
+ parallel runtime with a suitable -N
+ option, and additionally it might be beneficial to
+ restrict parallel GC to the old generation
+ with -qg1.
+
+
+
+
+
+
+ RTS
+ option
+
+
+
+ [New in GHC 6.12.1] [Default: 1] Use
+ load-balancing in the parallel GC in
+ generation gen and higher.
+ Omitting gen disables
+ load-balancing entirely.
+
+
+ Load-balancing shares out the work of GC between the
+ available cores. This is a good idea when the heap is
+ large and we need to parallelise the GC work, however it
+ is also pessimal for the short young-generation
+ collections in a parallel program, because it can harm
+ locality by moving data from the cache of the CPU where is
+ it being used to the cache of another CPU. Hence the
+ default is to do load-balancing only in the
+ old-generation. In fact, for a parallel program it is
+ sometimes beneficial to disable load-balancing entirely
+ with -qb.
+
+
+
+
+ sizeRTS option
@@ -399,46 +486,290 @@
+
+ file
+ RTS option
+
- file
+ fileRTS option
- file
+ fileRTS option
-
- Write modest () or verbose
- () garbage-collector statistics into file
- file. The default
- file is
- program.stat. The
- filestderr
- is treated specially, with the output really being sent to
- stderr.
-
- This option is useful for watching how the storage
- manager adjusts the heap size based on the current amount of
- live data.
-
-
-
-
-
- RTS option
+
+ RTS option
- Write a one-line GC stats summary after running the
- program. This output is in the same format as that produced
- by the option.
-
- As with , the default
- file is
- program.stat. The
- filestderr
- is treated specially, with the output really being sent to
- stderr.
+ These options produce runtime-system statistics, such
+ as the amount of time spent executing the program and in the
+ garbage collector, the amount of memory allocated, the
+ maximum size of the heap, and so on. The three
+ variants give different levels of detail:
+ produces a single line of output in the
+ same format as GHC's option,
+ produces a more detailed summary at the
+ end of the program, and additionally
+ produces information about each and every garbage
+ collection.
+
+ The output is placed in
+ file. If
+ file is omitted, then the output
+ is sent to stderr.
+
+
+ If you use the -t flag then, when your
+ program finishes, you will see something like this:
+
+
+
+<<ghc: 36169392 bytes, 69 GCs, 603392/1065272 avg/max bytes residency (2 samples), 3M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02 elapsed), 0.07 GC (0.07 elapsed) :ghc>>
+
+
+
+ This tells you:
+
+
+
+
+
+ The total number of bytes allocated by the program over the
+ whole run.
+
+
+
+
+ The total number of garbage collections performed.
+
+
+
+
+ The average and maximum "residency", which is the amount of
+ live data in bytes. The runtime can only determine the
+ amount of live data during a major GC, which is why the
+ number of samples corresponds to the number of major GCs
+ (and is usually relatively small). To get a better picture
+ of the heap profile of your program, use
+ the RTS option
+ ().
+
+
+
+
+ The peak memory the RTS has allocated from the OS.
+
+
+
+
+ The amount of CPU time and elapsed wall clock time while
+ initialising the runtime system (INIT), running the program
+ itself (MUT, the mutator), and garbage collecting (GC).
+
+
+
+
+
+ You can also get this in a more future-proof, machine readable
+ format, with -t --machine-readable:
+
+
+
+ [("bytes allocated", "36169392")
+ ,("num_GCs", "69")
+ ,("average_bytes_used", "603392")
+ ,("max_bytes_used", "1065272")
+ ,("num_byte_usage_samples", "2")
+ ,("peak_megabytes_allocated", "3")
+ ,("init_cpu_seconds", "0.00")
+ ,("init_wall_seconds", "0.00")
+ ,("mutator_cpu_seconds", "0.02")
+ ,("mutator_wall_seconds", "0.02")
+ ,("GC_cpu_seconds", "0.07")
+ ,("GC_wall_seconds", "0.07")
+ ]
+
+
+
+ If you use the -s flag then, when your
+ program finishes, you will see something like this (the exact
+ details will vary depending on what sort of RTS you have, e.g.
+ you will only see profiling data if your RTS is compiled for
+ profiling):
+
+
+
+ 36,169,392 bytes allocated in the heap
+ 4,057,632 bytes copied during GC
+ 1,065,272 bytes maximum residency (2 sample(s))
+ 54,312 bytes maximum slop
+ 3 MB total memory in use (0 MB lost due to fragmentation)
+
+ Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed
+ Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed
+
+ SPARKS: 359207 (557 converted, 149591 pruned)
+
+ INIT time 0.00s ( 0.00s elapsed)
+ MUT time 0.01s ( 0.02s elapsed)
+ GC time 0.07s ( 0.07s elapsed)
+ EXIT time 0.00s ( 0.00s elapsed)
+ Total time 0.08s ( 0.09s elapsed)
+
+ %GC time 89.5% (75.3% elapsed)
+
+ Alloc rate 4,520,608,923 bytes per MUT second
+
+ Productivity 10.5% of total user, 9.1% of total elapsed
+
+
+
+
+
+ The "bytes allocated in the heap" is the total bytes allocated
+ by the program over the whole run.
+
+
+
+
+ GHC uses a copying garbage collector by default. "bytes copied
+ during GC" tells you how many bytes it had to copy during
+ garbage collection.
+
+
+
+
+ The maximum space actually used by your program is the
+ "bytes maximum residency" figure. This is only checked during
+ major garbage collections, so it is only an approximation;
+ the number of samples tells you how many times it is checked.
+
+
+
+
+ The "bytes maximum slop" tells you the most space that is ever
+ wasted due to the way GHC allocates memory in blocks. Slop is
+ memory at the end of a block that was wasted. There's no way
+ to control this; we just like to see how much memory is being
+ lost this way.
+
+
+
+
+ The "total memory in use" tells you the peak memory the RTS has
+ allocated from the OS.
+
+
+
+
+ Next there is information about the garbage collections done.
+ For each generation it says how many garbage collections were
+ done, how many of those collections were done in parallel,
+ the total CPU time used for garbage collecting that generation,
+ and the total wall clock time elapsed while garbage collecting
+ that generation.
+
+
+
+ The SPARKS statistic refers to the
+ use of Control.Parallel.par and related
+ functionality in the program. Each spark represents a call
+ to par; a spark is "converted" when it is
+ executed in parallel; and a spark is "pruned" when it is
+ found to be already evaluated and is discarded from the pool
+ by the garbage collector. Any remaining sparks are
+ discarded at the end of execution, so "converted" plus
+ "pruned" does not necessarily add up to the total.
+
+
+
+ Next there is the CPU time and wall clock time elapsed broken
+ down by what the runtime system was doing at the time.
+ INIT is the runtime system initialisation.
+ MUT is the mutator time, i.e. the time spent actually running
+ your code.
+ GC is the time spent doing garbage collection.
+ RP is the time spent doing retainer profiling.
+ PROF is the time spent doing other profiling.
+ EXIT is the runtime system shutdown time.
+ And finally, Total is, of course, the total.
+
+
+ %GC time tells you what percentage GC is of Total.
+ "Alloc rate" tells you the "bytes allocated in the heap" divided
+ by the MUT CPU time.
+ "Productivity" tells you what percentage of the Total CPU and wall
+ clock elapsed times are spent in the mutator (MUT).
+
+
+
+
+
+ The -S flag, as well as giving the same
+ output as the -s flag, prints information
+ about each GC as it happens:
+
+
+
+ Alloc Copied Live GC GC TOT TOT Page Flts
+ bytes bytes bytes user elap user elap
+ 528496 47728 141512 0.01 0.02 0.02 0.02 0 0 (Gen: 1)
+[...]
+ 524944 175944 1726384 0.00 0.00 0.08 0.11 0 0 (Gen: 0)
+
+
+
+ For each garbage collection, we print:
+
+
+
+
+
+ How many bytes we allocated this garbage collection.
+
+
+
+
+ How many bytes we copied this garbage collection.
+
+
+
+
+ How many bytes are currently live.
+
+
+
+
+ How long this garbage collection took (CPU time and elapsed
+ wall clock time).
+
+
+
+
+ How long the program has been running (CPU time and elapsed
+ wall clock time).
+
+
+
+
+ How many page faults occured this garbage collection.
+
+
+
+
+ How many page faults occured since the end of the last garbage
+ collection.
+
+
+
+
+ Which generation is being garbage collected.
+
+
+
+
@@ -446,14 +777,132 @@
- RTS options for profiling and parallelism
+ RTS options for concurrency and parallelism
- The RTS options related to profiling are described in , those for concurrency in
+ The RTS options related to concurrency are described in
, and those for parallelism in
.
+
+ RTS options for profiling
+
+ Most profiling runtime options are only available when you
+ compile your program for profiling (see
+ , and
+ for the runtime options).
+ However, there is one profiling option that is available
+ for ordinary non-profiled executables:
+
+
+
+
+
+ RTS
+ option
+
+
+ Generates a basic heap profile, in the
+ file prog.hp.
+ To produce the heap profile graph,
+ use hp2ps (see ). The basic heap profile is broken down by data
+ constructor, with other types of closures (functions, thunks,
+ etc.) grouped into broad categories
+ (e.g. FUN, THUNK). To
+ get a more detailed profile, use the full profiling
+ support ().
+
+
+
+
+
+
+ Tracing
+
+ tracing
+ events
+ eventlog files
+
+
+ When the program is linked with the
+ option (), runtime events can
+ be logged in two ways:
+
+
+
+
+
+ In binary format to a file for later analysis by a
+ variety of tools. One such tool
+ is ThreadScopeThreadScope,
+ which interprets the event log to produce a visual parallel
+ execution profile of the program.
+
+
+
+
+ As text to standard output, for debugging purposes.
+
+
+
+
+
+
+
+
+ RTS option
+
+
+
+ Log events in binary format to the
+ file program.eventlog,
+ where type indicates the type
+ of events to log. Currently there is only one type
+ supported: -ls, for scheduler events.
+
+
+
+ The format of the log file is described by the header
+ EventLogFormat.h that comes with
+ GHC, and it can be parsed in Haskell using
+ the ghc-events
+ library. To dump the contents of
+ a .eventlog file as text, use the
+ tool show-ghc-events that comes with
+ the ghc-events
+ package.
+
+
+
+
+
+
+
+ RTS option
+
+
+
+ Log events as text to standard output, instead of to
+ the .eventlog file.
+
+
+
+
+
+
+
+ The debugging
+ options also
+ generate events which are logged using the tracing framework.
+ By default those events are dumped as text to stdout
+ (
+ implies ), but they may instead be stored in
+ the binary eventlog file by using the
+ option.
+
+
+
RTS options for hackers, debuggers, and over-interested
souls
@@ -490,14 +939,28 @@
- num
+ x-DRTS option
- An RTS debugging flag; varying quantities of output
- depending on which bits are set in
- num. Only works if the RTS was
- compiled with the option.
+
+ An RTS debugging flag; only availble if the program was
+ linked with the option. Various
+ values of x are provided to
+ enable debug messages and additional runtime sanity checks
+ in different subsystems in the RTS, for
+ example +RTS -Ds -RTS enables debug
+ messages from the scheduler.
+ Use +RTS -? to find out which
+ debug flags are supported.
+
+
+
+ Debug messages will be sent to the binary event log file
+ instead of stdout if the option is
+ added. This might be useful for reducing the overhead of
+ debug tracing.
+
@@ -510,20 +973,13 @@
Produce “ticky-ticky” statistics at the
- end of the program run. The file
- business works just like on the RTS
- option (above).
-
- “Ticky-ticky” statistics are counts of
- various program actions (updates, enters, etc.) The program
- must have been compiled using
-
- (a.k.a. “ticky-ticky profiling”), and, for it to
- be really useful, linked with suitable system libraries.
- Not a trivial undertaking: consult the installation guide on
- how to set things up for easy “ticky-ticky”
- profiling. For more information, see .
+ end of the program run (only available if the program was
+ linked with ).
+ The file business works just like
+ on the RTS option, above.
+
+ For more information on ticky-ticky profiling, see
+ .
@@ -671,18 +1127,137 @@ char *ghc_rts_opts = "-H128m -K1m";
itself. To do this, use the flag, e.g.
$ ./a.out +RTS --info
- [("GHC RTS", "Yes")
+ [("GHC RTS", "YES")
,("GHC version", "6.7")
,("RTS way", "rts_p")
,("Host platform", "x86_64-unknown-linux")
+ ,("Host architecture", "x86_64")
+ ,("Host OS", "linux")
+ ,("Host vendor", "unknown")
,("Build platform", "x86_64-unknown-linux")
+ ,("Build architecture", "x86_64")
+ ,("Build OS", "linux")
+ ,("Build vendor", "unknown")
,("Target platform", "x86_64-unknown-linux")
+ ,("Target architecture", "x86_64")
+ ,("Target OS", "linux")
+ ,("Target vendor", "unknown")
+ ,("Word size", "64")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
]
The information is formatted such that it can be read as a
- of type [(String, String)].
+ of type [(String, String)]. Currently the following
+ fields are present:
+
+
+
+
+ GHC RTS
+
+ Is this program linked against the GHC RTS? (always
+ "YES").
+
+
+
+
+ GHC version
+
+ The version of GHC used to compile this program.
+
+
+
+
+ RTS way
+
+ The variant (“way”) of the runtime. The
+ most common values are rts (vanilla),
+ rts_thr (threaded runtime, i.e. linked using the
+ -threaded option) and rts_p
+ (profiling runtime, i.e. linked using the -prof
+ option). Other variants include debug
+ (linked using -debug),
+ t (ticky-ticky profiling) and
+ dyn (the RTS is
+ linked in dynamically, i.e. a shared library, rather than statically
+ linked into the executable itself). These can be combined,
+ e.g. you might have rts_thr_debug_p.
+
+
+
+
+
+ Target platform,
+ Target architecture,
+ Target OS,
+ Target vendor
+
+
+ These are the platform the program is compiled to run on.
+
+
+
+
+
+ Build platform,
+ Build architecture,
+ Build OS,
+ Build vendor
+
+
+ These are the platform where the program was built
+ on. (That is, the target platform of GHC itself.) Ordinarily
+ this is identical to the target platform. (It could potentially
+ be different if cross-compiling.)
+
+
+
+
+
+ Host platform,
+ Host architecture
+ Host OS
+ Host vendor
+
+
+ These are the platform where GHC itself was compiled.
+ Again, this would normally be identical to the build and
+ target platforms.
+
+
+
+
+ Word size
+
+ Either "32" or "64",
+ reflecting the word size of the target platform.
+
+
+
+
+ Compiler unregistered
+
+ Was this program compiled with an “unregistered”
+ version of GHC? (I.e., a version of GHC that has no platform-specific
+ optimisations compiled in, usually because this is a currently
+ unsupported platform.) This value will usually be no, unless you're
+ using an experimental build of GHC.
+
+
+
+
+ Tables next to code
+
+ Putting info tables directly next to entry code is a useful
+ performance optimisation that is not available on all platforms.
+ This field tells you whether the program has been compiled with
+ this optimisation. (Usually yes, except on unusual platforms.)
+
+
+
+
+