code and then links it with a non-trivial runtime system (RTS),
which handles storage management, profiling, etc.</para>
- <para>You have some control over the behaviour of the RTS, by giving
+ <para>If you set the <literal>-rtsopts</literal> flag appropriately when linking,
+ you have some control over the behaviour of the RTS, by giving
special command-line arguments to your program.</para>
<para>When your Haskell program starts up, its RTS extracts
wraparound in the counters is <emphasis>your</emphasis>
fault!)</para>
- <para>Giving a <literal>+RTS -f</literal>
- <indexterm><primary><option>-f</option></primary><secondary>RTS option</secondary></indexterm> option
+ <para>Giving a <literal>+RTS -?</literal>
+ <indexterm><primary><option>-?</option></primary><secondary>RTS option</secondary></indexterm> option
will print out the RTS options actually available in your program
(which vary, depending on how you compiled).</para>
<literal>+RTS -M128m -RTS</literal>
to the command line.</para>
- <sect2 id="rts-optinos-environment">
+ <sect2 id="rts-options-environment">
<title>Setting global RTS options</title>
<indexterm><primary>RTS options</primary><secondary>from the environment</secondary></indexterm>
<indexterm><primary>environment variable</primary><secondary>for
setting RTS options</secondary></indexterm>
- <para>RTS options are also taken from the environment variable
+ <para>If the <literal>-rtsopts</literal> flag is set to
+ something other than <literal>none</literal> when linking,
+ RTS options are also taken from the environment variable
<envar>GHCRTS</envar><indexterm><primary><envar>GHCRTS</envar></primary>
</indexterm>. For example, to set the maximum heap size
to 128M for all GHC-compiled programs (using an
things like ctrl-C. This option is primarily useful for when
you are using the Haskell code as a DLL, and want to set your
own signal handlers.</para>
+
+ <para>Note that even
+ with <option>--install-signal-handlers=no</option>, the RTS
+ interval timer signal is still enabled. The timer signal
+ is either SIGVTALRM or SIGALRM, depending on the RTS
+ configuration and OS capabilities. To disable the timer
+ signal, use the <literal>-V0</literal> RTS option (see
+ above).
+ </para>
</listitem>
</varlistentry>
<varlistentry>
<term>
- <option>-q1</option>
- <indexterm><primary><option>-q1</option><secondary>RTS
+ <option>-qg<optional><replaceable>gen</replaceable></optional></option>
+ <indexterm><primary><option>-qg</option><secondary>RTS
option</secondary></primary></indexterm>
</term>
<listitem>
- <para>[New in GHC 6.12.1] Disable the parallel GC.
- The parallel GC is turned on automatically when parallel
- execution is enabled with the <option>-N</option> option;
- this option is available to turn it off if
- necessary.</para>
+ <para>[New in GHC 6.12.1] [Default: 0]
+ Use parallel GC in
+ generation <replaceable>gen</replaceable> and higher.
+ Omitting <replaceable>gen</replaceable> turns off the
+ parallel GC completely, reverting to sequential GC.</para>
- <para>Experiments have shown that parallel GC usually
- results in a performance improvement given 3 cores or
- more; with 2 cores it may or may not be beneficial,
- depending on the workload. Bigger heaps work better with
- parallel GC, so set your <option>-H</option> value high (3
- or more times the maximum residency). Look at the timing
- stats with <option>+RTS -s</option> to see whether you're
- getting any benefit from parallel GC or not. If you find
- parallel GC is significantly <emphasis>slower</emphasis>
- (in elapsed time) than sequential GC, please report it as
- a bug.</para>
-
- <para>In GHC 6.10.1 it was possible to use a different
- number of threads for GC than for execution, because the GC
- used its own pool of threads. Now, the GC uses the same
- threads as the mutator (for executing the program).</para>
+ <para>The default parallel GC settings are usually suitable
+ for parallel programs (i.e. those
+ using <literal>par</literal>, Strategies, or with multiple
+ threads). However, it is sometimes beneficial to enable
+ the parallel GC for a single-threaded sequential program
+ too, especially if the program has a large amount of heap
+ data and GC is a significant fraction of runtime. To use
+ the parallel GC in a sequential program, enable the
+ parallel runtime with a suitable <literal>-N</literal>
+ option, and additionally it might be beneficial to
+ restrict parallel GC to the old generation
+ with <literal>-qg1</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
- <option>-qg<replaceable>n</replaceable></option>
- <indexterm><primary><option>-qg</option><secondary>RTS
+ <option>-qb<optional><replaceable>gen</replaceable></optional></option>
+ <indexterm><primary><option>-qb</option><secondary>RTS
option</secondary></primary></indexterm>
</term>
<listitem>
<para>
- [Default: 1] [New in GHC 6.12.1]
- Enable the parallel GC only in
- generation <replaceable>n</replaceable> and greater.
- Parallel GC is often not worthwhile for collections in
- generation 0 (the young generation), so it is enabled by
- default only for collections in generation 1 (and higher,
- if applicable).
+ [New in GHC 6.12.1] [Default: 1] Use
+ load-balancing in the parallel GC in
+ generation <replaceable>gen</replaceable> and higher.
+ Omitting <replaceable>gen</replaceable> disables
+ load-balancing entirely.</para>
+
+ <para>
+ Load-balancing shares out the work of GC between the
+ available cores. This is a good idea when the heap is
+ large and we need to parallelise the GC work, however it
+ is also pessimal for the short young-generation
+ collections in a parallel program, because it can harm
+ locality by moving data from the cache of the CPU where is
+ it being used to the cache of another CPU. Hence the
+ default is to do load-balancing only in the
+ old-generation. In fact, for a parallel program it is
+ sometimes beneficial to disable load-balancing entirely
+ with <literal>-qb</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
- <option>-k</option><replaceable>size</replaceable>
+ <option>-ki</option><replaceable>size</replaceable>
<indexterm><primary><option>-k</option></primary><secondary>RTS option</secondary></indexterm>
- <indexterm><primary>stack, minimum size</primary></indexterm>
+ <indexterm><primary>stack, initial size</primary></indexterm>
</term>
<listitem>
- <para>[Default: 1k] Set the initial stack size for
- new threads. Thread stacks (including the main thread's
- stack) live on the heap, and grow as required. The default
- value is good for concurrent applications with lots of small
- threads; if your program doesn't fit this model then
- increasing this option may help performance.</para>
-
- <para>The main thread is normally started with a slightly
- larger heap to cut down on unnecessary stack growth while
- the program is starting up.</para>
- </listitem>
+ <para>
+ [Default: 1k] Set the initial stack size for new
+ threads. (Note: this flag used to be
+ simply <option>-k</option>, but was renamed
+ to <option>-ki</option> in GHC 7.2.1. The old name is
+ still accepted for backwards compatibility, but that may
+ be removed in a future version).
+ </para>
+
+ <para>
+ Thread stacks (including the main thread's stack) live on
+ the heap. As the stack grows, new stack chunks are added
+ as required; if the stack shrinks again, these extra stack
+ chunks are reclaimed by the garbage collector. The
+ default initial stack size is deliberately small, in order
+ to keep the time and space overhead for thread creation to
+ a minimum, and to make it practical to spawn threads for
+ even tiny pieces of work.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-kc</option><replaceable>size</replaceable>
+ <indexterm><primary><option>-kc</option></primary><secondary>RTS
+ option</secondary></indexterm>
+ <indexterm><primary>stack</primary><secondary>chunk size</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ [Default: 32k] Set the size of “stack
+ chunks”. When a thread's current stack overflows, a
+ new stack chunk is created and added to the thread's
+ stack, until the limit set by <option>-K</option> is
+ reached.
+ </para>
+
+ <para>
+ The advantage of smaller stack chunks is that the garbage
+ collector can avoid traversing stack chunks if they are
+ known to be unmodified since the last collection, so
+ reducing the chunk size means that the garbage collector
+ can identify more stack as unmodified, and the GC overhead
+ might be reduced. On the other hand, making stack chunks
+ too small adds some overhead as there will be more
+ overflow/underflow between chunks. The default setting of
+ 32k appears to be a reasonable compromise in most cases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-kb</option><replaceable>size</replaceable>
+ <indexterm><primary><option>-kc</option></primary><secondary>RTS
+ option</secondary></indexterm>
+ <indexterm><primary>stack</primary><secondary>chunk buffer size</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ [Default: 1k] Sets the stack chunk buffer size.
+ When a stack chunk overflows and a new stack chunk is
+ created, some of the data from the previous stack chunk is
+ moved into the new chunk, to avoid an immediate underflow
+ and repeated overflow/underflow at the boundary. The
+ amount of stack moved is set by the <option>-kb</option>
+ option.
+ </para>
+ <para>
+ Note that to avoid wasting space, this value should
+ typically be less than 10% of the size of a stack
+ chunk (<option>-kc</option>), because in a chain of stack
+ chunks, each chunk will have a gap of unused space of this
+ size.
+ </para>
+ </listitem>
</varlistentry>
<varlistentry>
<listitem>
<para>[Default: 8M] Set the maximum stack size for
an individual thread to <replaceable>size</replaceable>
- bytes. This option is there purely to stop the program
- eating up all the available memory in the machine if it gets
- into an infinite loop.</para>
+ bytes. If the thread attempts to exceed this limit, it will
+ be send the <literal>StackOverflow</literal> exception.
+ </para>
+ <para>
+ This option is there mainly to stop the program eating up
+ all the available memory in the machine if it gets into an
+ infinite loop.
+ </para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
+ <sect2 id="rts-eventlog">
+ <title>Tracing</title>
+
+ <indexterm><primary>tracing</primary></indexterm>
+ <indexterm><primary>events</primary></indexterm>
+ <indexterm><primary>eventlog files</primary></indexterm>
+
+ <para>
+ When the program is linked with the <option>-eventlog</option>
+ option (<xref linkend="options-linker" />), runtime events can
+ be logged in two ways:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ In binary format to a file for later analysis by a
+ variety of tools. One such tool
+ is <ulink url="http://hackage.haskell.org/package/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>,
+ which interprets the event log to produce a visual parallel
+ execution profile of the program.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ As text to standard output, for debugging purposes.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <option>-l<optional><replaceable>flags</replaceable></optional></option>
+ <indexterm><primary><option>-l</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Log events in binary format to the
+ file <filename><replaceable>program</replaceable>.eventlog</filename>,
+ where <replaceable>flags</replaceable> is a sequence of
+ zero or more characters indicating which kinds of events
+ to log. Currently there is only one type
+ supported: <literal>-ls</literal>, for scheduler events.
+ </para>
+
+ <para>
+ The format of the log file is described by the header
+ <filename>EventLogFormat.h</filename> that comes with
+ GHC, and it can be parsed in Haskell using
+ the <ulink url="http://hackage.haskell.org/package/ghc-events">ghc-events</ulink>
+ library. To dump the contents of
+ a <literal>.eventlog</literal> file as text, use the
+ tool <literal>show-ghc-events</literal> that comes with
+ the <ulink url="http://hackage.haskell.org/package/ghc-events">ghc-events</ulink>
+ package.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <option>-v</option><optional><replaceable>flags</replaceable></optional>
+ <indexterm><primary><option>-v</option></primary><secondary>RTS option</secondary></indexterm>
+ </term>
+ <listitem>
+ <para>
+ Log events as text to standard output, instead of to
+ the <literal>.eventlog</literal> file.
+ The <replaceable>flags</replaceable> are the same as
+ for <option>-l</option>, with the additional
+ option <literal>t</literal> which indicates that the
+ each event printed should be preceded by a timestamp value
+ (in the binary <literal>.eventlog</literal> file, all
+ events are automatically associated with a timestamp).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ <para>
+ The debugging
+ options <option>-D<replaceable>x</replaceable></option> also
+ generate events which are logged using the tracing framework.
+ By default those events are dumped as text to stdout
+ (<option>-D<replaceable>x</replaceable></option>
+ implies <option>-v</option>), but they may instead be stored in
+ the binary eventlog file by using the <option>-l</option>
+ option.
+ </para>
+ </sect2>
+
<sect2 id="rts-options-debugging">
<title>RTS options for hackers, debuggers, and over-interested
souls</title>
<varlistentry>
<term>
- <option>-D</option><replaceable>num</replaceable>
+ <option>-D</option><replaceable>x</replaceable>
<indexterm><primary>-D</primary><secondary>RTS option</secondary></indexterm>
</term>
<listitem>
- <para>An RTS debugging flag; varying quantities of output
- depending on which bits are set in
- <replaceable>num</replaceable>. Only works if the RTS was
- compiled with the <option>DEBUG</option> option.</para>
+ <para>
+ An RTS debugging flag; only availble if the program was
+ linked with the <option>-debug</option> option. Various
+ values of <replaceable>x</replaceable> are provided to
+ enable debug messages and additional runtime sanity checks
+ in different subsystems in the RTS, for
+ example <literal>+RTS -Ds -RTS</literal> enables debug
+ messages from the scheduler.
+ Use <literal>+RTS -?</literal> to find out which
+ debug flags are supported.
+ </para>
+
+ <para>
+ Debug messages will be sent to the binary event log file
+ instead of stdout if the <option>-l</option> option is
+ added. This might be useful for reducing the overhead of
+ debug tracing.
+ </para>
</listitem>
</varlistentry>
</term>
<listitem>
<para>Produce “ticky-ticky” statistics at the
- end of the program run. The <replaceable>file</replaceable>
- business works just like on the <option>-S</option> RTS
- option (above).</para>
-
- <para>“Ticky-ticky” statistics are counts of
- various program actions (updates, enters, etc.) The program
- must have been compiled using
- <option>-ticky</option><indexterm><primary><option>-ticky</option></primary></indexterm>
- (a.k.a. “ticky-ticky profiling”), and, for it to
- be really useful, linked with suitable system libraries.
- Not a trivial undertaking: consult the installation guide on
- how to set things up for easy “ticky-ticky”
- profiling. For more information, see <xref
- linkend="ticky-ticky"/>.</para>
+ end of the program run (only available if the program was
+ linked with <option>-debug</option>).
+ The <replaceable>file</replaceable> business works just like
+ on the <option>-S</option> RTS option, above.</para>
+
+ <para>For more information on ticky-ticky profiling, see
+ <xref linkend="ticky-ticky"/>.</para>
</listitem>
</varlistentry>
</sect2>
+ <sect2>
+ <title>Linker flags to change RTS behaviour</title>
+
+ <indexterm><primary>RTS behaviour, changing</primary></indexterm>
+
+ <para>
+ GHC lets you exercise rudimentary control over the RTS settings
+ for any given program, by using the <literal>-with-rtsopts</literal>
+ linker flag. For example, to set <literal>-H128m -K1m</literal>,
+ link with <literal>-with-rtsopts="-H128m -K1m"</literal>.
+ </para>
+
+ </sect2>
+
<sect2 id="rts-hooks">
<title>“Hooks” to change RTS behaviour</title>
<para>The hook <literal>ghc_rts_opts</literal><indexterm><primary><literal>ghc_rts_opts</literal></primary>
</indexterm>lets you set RTS
- options permanently for a given program. A common use for this is
+ options permanently for a given program, in the same way as the
+ newer <option>-with-rtsopts</option> linker option does. A common use for this is
to give your program a default heap and/or stack size that is
greater than the default. For example, to set <literal>-H128m
-K1m</literal>, place the following definition in a C source
<!-- Emacs stuff:
;;; Local Variables: ***
- ;;; mode: xml ***
;;; sgml-parent-document: ("users_guide.xml" "book" "chapter" "sect1") ***
;;; End: ***
-->