Make some profiling flags dynamic
[ghc-hetmet.git] / docs / users_guide / runtime_control.xml
index daed07c..94995b3 100644 (file)
 
   </sect2>
 
+  <sect2 id="rts-options-misc">
+    <title>Miscellaneous RTS options</title>
+
+    <variablelist>
+     <varlistentry>
+       <term><option>-V<replaceable>secs</replaceable></option>
+       <indexterm><primary><option>-V</option></primary><secondary>RTS
+       option</secondary></indexterm></term>
+       <listitem>
+         <para>Sets the interval that the RTS clock ticks at.  The
+         runtime uses a single timer signal to count ticks; this timer
+         signal is used to control the context switch timer (<xref
+         linkend="using-concurrent" />) and the heap profiling
+         timer <xref linkend="rts-options-heap-prof" />.  Also, the
+         time profiler uses the RTS timer signal directly to record
+         time profiling samples.</para>
+
+         <para>Normally, setting the <option>-V</option> option
+         directly is not necessary: the resolution of the RTS timer is
+         adjusted automatically if a short interval is requested with
+         the <option>-C</option> or <option>-i</option> options.
+         However, setting <option>-V</option> is required in order to
+         increase the resolution of the time profiler.</para>
+
+         <para>Using a value of zero disables the RTS clock
+         completely, and has the effect of disabling timers that
+         depend on it: the context switch timer and the heap profiling
+         timer.  Context switches will still happen, but
+         deterministically and at a rate much faster than normal.
+         Disabling the interval timer is useful for debugging, because
+         it eliminates a source of non-determinism at runtime.</para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>--install-signal-handlers=<replaceable>yes|no</replaceable></option>
+       <indexterm><primary><option>--install-signal-handlers</option></primary><secondary>RTS
+       option</secondary></indexterm></term>
+       <listitem>
+         <para>If yes (the default), the RTS installs signal handlers to catch
+         things like ctrl-C. This option is primarily useful for when
+         you are using the Haskell code as a DLL, and want to set your
+         own signal handlers.</para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-xm<replaceable>address</replaceable></option>
+       <indexterm><primary><option>-xm</option></primary><secondary>RTS
+       option</secondary></indexterm></term>
+       <listitem>
+         <para>
+           WARNING: this option is for working around memory
+           allocation problems only.  Do not use unless GHCi fails
+           with a message like &ldquo;<literal>failed to mmap() memory below 2Gb</literal>&rdquo;.  If you need to use this option to get GHCi working
+           on your machine, please file a bug.
+         </para>
+         
+         <para>
+           On 64-bit machines, the RTS needs to allocate memory in the
+           low 2Gb of the address space.  Support for this across
+           different operating systems is patchy, and sometimes fails.
+           This option is there to give the RTS a hint about where it
+           should be able to allocate memory in the low 2Gb of the
+           address space.  For example, <literal>+RTS -xm20000000
+           -RTS</literal> would hint that the RTS should allocate
+           starting at the 0.5Gb mark.  The default is to use the OS's
+           built-in support for allocating memory in the low 2Gb if
+           available (e.g. <literal>mmap</literal>
+           with <literal>MAP_32BIT</literal> on Linux), or
+           otherwise <literal>-xm40000000</literal>.
+         </para>
+       </listitem>
+     </varlistentry>
+    </variablelist>
+  </sect2>
+
   <sect2 id="rts-options-gc">
     <title>RTS options to control the garbage collector</title>
 
       </varlistentry>
 
       <varlistentry>
+        <term>
+          <option>-g</option><replaceable>threads</replaceable>
+          <indexterm><primary><option>-g</option></primary><secondary>RTS option</secondary></indexterm>
+        </term>
+        <listitem>
+          <para>&lsqb;Default: 1&rsqb; &lsqb;new in GHC 6.10&rsqb; Set the number
+            of threads to use for garbage collection.  This option is
+            only accepted when the program was linked with the
+            <option>-threaded</option> option; see <xref
+            linkend="options-linker" />.</para>
+
+          <para>The garbage collector is able to work in parallel when
+            given more than one OS thread.  Experiments have shown
+            that this usually results in a performance improvement
+            given 3 cores or more; with 2 cores it may or may not be
+            beneficial, depending on the workload.  Bigger heaps work
+            better with parallel GC, so set your <option>-H</option>
+            value high (3 or more times the maximum residency).  Look
+            at the timing stats with <option>+RTS -s</option> to
+            see whether you're getting any benefit from parallel GC or
+            not.  If you find parallel GC is
+            significantly <emphasis>slower</emphasis> (in elapsed
+            time) than sequential GC, please report it as a
+            bug.</para>
+
+          <para>This value is set automatically when the
+            <option>-N</option> option is used, so the only reason to
+            use <option>-g</option> would be if you wanted to use a
+            different number of threads for GC than for execution.
+            For example, if your program is strictly single-threaded
+            but you still want to benefit from parallel GC, then it
+            might make sense to use <option>-g</option> rather than
+            <option>-N</option>.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term>
           <option>-H</option><replaceable>size</replaceable>
           <indexterm><primary><option>-H</option></primary><secondary>RTS option</secondary></indexterm>
       <varlistentry>
        <term>
          <option>-I</option><replaceable>seconds</replaceable>
-         <indexterm><primary><option>-H</option></primary>
+         <indexterm><primary><option>-I</option></primary>
            <secondary>RTS option</secondary>
          </indexterm>
          <indexterm><primary>idle GC</primary>
       </varlistentry>
 
       <varlistentry>
+        <term>
+          <option>-t</option><optional><replaceable>file</replaceable></optional>
+          <indexterm><primary><option>-t</option></primary><secondary>RTS option</secondary></indexterm>
+        </term>
        <term>
-          <option>-s</option><replaceable>file</replaceable>
+          <option>-s</option><optional><replaceable>file</replaceable></optional>
           <indexterm><primary><option>-s</option></primary><secondary>RTS option</secondary></indexterm>
         </term>
        <term>
-          <option>-S</option><replaceable>file</replaceable>
+          <option>-S</option><optional><replaceable>file</replaceable></optional>
           <indexterm><primary><option>-S</option></primary><secondary>RTS option</secondary></indexterm>
         </term>
-       <listitem>
-         <para>Write modest (<option>-s</option>) or verbose
-          (<option>-S</option>) garbage-collector statistics into file
-          <replaceable>file</replaceable>. The default
-          <replaceable>file</replaceable> is
-          <filename><replaceable>program</replaceable>.stat</filename>. The
-          <replaceable>file</replaceable> <constant>stderr</constant>
-          is treated specially, with the output really being sent to
-          <constant>stderr</constant>.</para>
-
-         <para>This option is useful for watching how the storage
-          manager adjusts the heap size based on the current amount of
-          live data.</para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry>
        <term>
-          <option>-t<replaceable>file</replaceable></option>
-          <indexterm><primary><option>-t</option></primary><secondary>RTS option</secondary></indexterm>
+          <option>--machine-readable</option>
+          <indexterm><primary><option>--machine-readable</option></primary><secondary>RTS option</secondary></indexterm>
         </term>
        <listitem>
-         <para>Write a one-line GC stats summary after running the
-         program.  This output is in the same format as that produced
-         by the <option>-Rghc-timing</option> option.</para>
-
-         <para>As with <option>-s</option>, the default
-          <replaceable>file</replaceable> is
-          <filename><replaceable>program</replaceable>.stat</filename>. The
-          <replaceable>file</replaceable> <constant>stderr</constant>
-          is treated specially, with the output really being sent to
-          <constant>stderr</constant>.</para>
+         <para>These options produce runtime-system statistics, such
+         as the amount of time spent executing the program and in the
+         garbage collector, the amount of memory allocated, the
+         maximum size of the heap, and so on.  The three
+         variants give different levels of detail:
+         <option>-t</option> produces a single line of output in the
+         same format as GHC's <option>-Rghc-timing</option> option,
+         <option>-s</option> produces a more detailed summary at the
+         end of the program, and <option>-S</option> additionally
+         produces information about each and every garbage
+         collection.</para>
+
+          <para>The output is placed in
+          <replaceable>file</replaceable>.  If
+          <replaceable>file</replaceable> is omitted, then the output
+          is sent to <constant>stderr</constant>.</para>
+
+    <para>
+        If you use the <literal>-t</literal> flag then, when your
+        program finishes, you will see something like this:
+    </para>
+
+<programlisting>
+&lt;&lt;ghc: 36169392 bytes, 69 GCs, 603392/1065272 avg/max bytes residency (2 samples), 3M in use, 0.00 INIT (0.00 elapsed), 0.02 MUT (0.02 elapsed), 0.07 GC (0.07 elapsed) :ghc&gt;&gt;
+</programlisting>
+
+    <para>
+        This tells you:
+    </para>
+
+    <itemizedlist>
+      <listitem>
+        <para>
+          The total bytes allocated by the program. This may be less
+          than the peak memory use, as some may be freed. 
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          The total number of garbage collections that occurred.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          The average and maximum space used by your program.
+          This is only checked during major garbage collections, so it
+          is only an approximation; the number of samples tells you how
+          many times it is checked.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          The peak memory the RTS has allocated from the OS. 
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          The amount of CPU time and elapsed wall clock time while
+          initialising the runtime system (INIT), running the program
+          itself (MUT, the mutator), and garbage collecting (GC).
+        </para>
+      </listitem>
+    </itemizedlist>
+
+    <para>
+        You can also get this in a more future-proof, machine readable
+        format, with <literal>-t --machine-readable</literal>:
+    </para>
+
+<programlisting>
+ [("bytes allocated", "36169392")
+ ,("num_GCs", "69")
+ ,("average_bytes_used", "603392")
+ ,("max_bytes_used", "1065272")
+ ,("num_byte_usage_samples", "2")
+ ,("peak_megabytes_allocated", "3")
+ ,("init_cpu_seconds", "0.00")
+ ,("init_wall_seconds", "0.00")
+ ,("mutator_cpu_seconds", "0.02")
+ ,("mutator_wall_seconds", "0.02")
+ ,("GC_cpu_seconds", "0.07")
+ ,("GC_wall_seconds", "0.07")
+ ]
+</programlisting>
+
+    <para>
+        If you use the <literal>-s</literal> flag then, when your
+        program finishes, you will see something like this (the exact
+        details will vary depending on what sort of RTS you have, e.g.
+        you will only see profiling data if your RTS is compiled for
+        profiling):
+    </para>
+
+<programlisting>
+      36,169,392 bytes allocated in the heap
+       4,057,632 bytes copied during GC
+       1,065,272 bytes maximum residency (2 sample(s))
+          54,312 bytes maximum slop
+               3 MB total memory in use (0 MB lost due to fragmentation)
+
+  Generation 0:    67 collections,     0 parallel,  0.04s,  0.03s elapsed
+  Generation 1:     2 collections,     0 parallel,  0.03s,  0.04s elapsed
+
+  SPARKS: 359207 (557 converted, 149591 pruned)
+
+  INIT  time    0.00s  (  0.00s elapsed)
+  MUT   time    0.01s  (  0.02s elapsed)
+  GC    time    0.07s  (  0.07s elapsed)
+  EXIT  time    0.00s  (  0.00s elapsed)
+  Total time    0.08s  (  0.09s elapsed)
+
+  %GC time      89.5%  (75.3% elapsed)
+
+  Alloc rate    4,520,608,923 bytes per MUT second
+
+  Productivity  10.5% of total user, 9.1% of total elapsed
+</programlisting>
+
+    <itemizedlist>
+      <listitem>
+        <para>
+        The "bytes allocated in the heap" is the total bytes allocated
+        by the program. This may be less than the peak memory use, as
+        some may be freed.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+        GHC uses a copying garbage collector. "bytes copied during GC" 
+        tells you how many bytes it had to copy during garbage collection.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+        The maximum space actually used by your program is the
+        "bytes maximum residency" figure. This is only checked during
+        major garbage collections, so it is only an approximation;
+        the number of samples tells you how many times it is checked.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+        The "bytes maximum slop" tells you the most space that is ever
+        wasted due to the way GHC packs data into so-called "megablocks".
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+        The "total memory in use" tells you the peak memory the RTS has
+        allocated from the OS.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+        Next there is information about the garbage collections done.
+        For each generation it says how many garbage collections were
+        done, how many of those collections used multiple threads,
+        the total CPU time used for garbage collecting that generation,
+        and the total wall clock time elapsed while garbage collecting
+        that generation.
+        </para>
+      </listitem>
+      <listitem>
+        <para>The <literal>SPARKS</literal> statistic refers to the
+          use of <literal>Control.Parallel.par</literal> and related
+          functionality in the program.  Each spark represents a call
+          to <literal>par</literal>; a spark is "converted" when it is
+          executed in parallel; and a spark is "pruned" when it is
+          found to be already evaluated and is discarded from the pool
+          by the garbage collector.  Any remaining sparks are
+          discarded at the end of execution, so "converted" plus
+          "pruned" does not necessarily add up to the total.</para>
+      </listitem>
+      <listitem>
+        <para>
+        Next there is the CPU time and wall clock time elapsedm broken
+        down by what the runtiem system was doing at the time.
+        INIT is the runtime system initialisation.
+        MUT is the mutator time, i.e. the time spent actually running
+        your code.
+        GC is the time spent doing garbage collection.
+        RP is the time spent doing retainer profiling.
+        PROF is the time spent doing other profiling.
+        EXIT is the runtime system shutdown time.
+        And finally, Total is, of course, the total.
+        </para>
+        <para>
+        %GC time tells you what percentage GC is of Total.
+        "Alloc rate" tells you the "bytes allocated in the heap" divided
+        by the MUT CPU time.
+        "Productivity" tells you what percentage of the Total CPU and wall
+        clock elapsed times are spent in the mutator (MUT).
+        </para>
+      </listitem>
+    </itemizedlist>
+
+    <para>
+        The <literal>-S</literal> flag, as well as giving the same
+        output as the <literal>-s</literal> flag, prints information
+        about each GC as it happens:
+    </para>
+
+<programlisting>
+    Alloc    Copied     Live    GC    GC     TOT     TOT  Page Flts
+    bytes     bytes     bytes  user  elap    user    elap
+   528496     47728    141512  0.01  0.02    0.02    0.02    0    0  (Gen:  1)
+[...]
+   524944    175944   1726384  0.00  0.00    0.08    0.11    0    0  (Gen:  0)
+</programlisting>
+
+    <para>
+        For each garbage collection, we print:
+    </para>
+
+    <itemizedlist>
+      <listitem>
+        <para>
+          How many bytes we allocated this garbage collection.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How many bytes we copied this garbage collection.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How many bytes are currently live.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How long this garbage collection took (CPU time and elapsed
+          wall clock time).
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How long the program has been running (CPU time and elapsed
+          wall clock time).
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How many page faults occured this garbage collection.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          How many page faults occured since the end of the last garbage
+          collection.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          Which generation is being garbage collected.
+        </para>
+      </listitem>
+    </itemizedlist>
+
        </listitem>
       </varlistentry>
     </variablelist>
   </sect2>
 
   <sect2>
-    <title>RTS options for profiling and Concurrent/Parallel Haskell</title>
+    <title>RTS options for concurrency and parallelism</title>
 
-    <para>The RTS options related to profiling are described in <xref
-    linkend="rts-options-heap-prof"/>; and those for concurrent/parallel
-    stuff, in <xref linkend="parallel-rts-opts"/>.</para>
+    <para>The RTS options related to concurrency are described in
+      <xref linkend="using-concurrent" />, and those for parallelism in
+      <xref linkend="parallel-options"/>.</para>
+  </sect2>
+
+  <sect2 id="rts-profiling">
+    <title>RTS options for profiling</title>
+
+    <para>Most profiling runtime options are only available when you
+    compile your program for profiling (see
+    <xref linkend="prof-compiler-options" />, and
+    <xref linkend="rts-options-heap-prof" /> for the runtime options).
+    However, there is one profiling option that is available
+    for ordinary non-profiled executables:</para>
+
+    <variablelist>
+      <varlistentry>
+        <term>
+          <option>-hT</option>
+          <indexterm><primary><option>-hT</option></primary><secondary>RTS
+              option</secondary></indexterm>
+        </term>
+        <listitem>
+          <para>Generates a basic heap profile, in the
+            file <literal><replaceable>prog</replaceable>.hp</literal>.
+            To produce the heap profile graph,
+            use <command>hp2ps</command> (see <xref linkend="hp2ps"
+                                                    />).  The basic heap profile is broken down by data
+            constructor, with other types of closures (functions, thunks,
+            etc.) grouped into broad categories
+            (e.g. <literal>FUN</literal>, <literal>THUNK</literal>).  To
+            get a more detailed profile, use the full profiling
+            support (<xref linkend="profiling" />).</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
   </sect2>
 
   <sect2 id="rts-options-debugging">
@@ -612,6 +996,29 @@ char *ghc_rts_opts = "-H128m -K1m";
     <filename>ghc/compiler/parser/hschooks.c</filename> in a GHC
     source tree.</para>
   </sect2>
+
+  <sect2>
+    <title>Getting information about the RTS</title>
+
+    <indexterm><primary>RTS</primary></indexterm>
+
+    <para>It is possible to ask the RTS to give some information about
+    itself. To do this, use the <option>--info</option> flag, e.g.</para>
+<screen>
+$ ./a.out +RTS --info
+ [("GHC RTS", "Yes")
+ ,("GHC version", "6.7")
+ ,("RTS way", "rts_p")
+ ,("Host platform", "x86_64-unknown-linux")
+ ,("Build platform", "x86_64-unknown-linux")
+ ,("Target platform", "x86_64-unknown-linux")
+ ,("Compiler unregisterised", "NO")
+ ,("Tables next to code", "YES")
+ ]
+</screen>
+    <para>The information is formatted such that it can be read as a
+    of type <literal>[(String, String)]</literal>.</para>
+  </sect2>
 </sect1>
 
 <!-- Emacs stuff: