X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=docs%2Fusers_guide%2Fruntime_control.xml;h=2b16234496a254dbcba699bb000f4c7cc50e0b42;hb=f04dead93a15af1cb818172f207b8a81d2c81298;hp=d8735e209df1f6ddd83d03d5ae93421756bb6a6e;hpb=00fe691ba258b2d9c8d5d85a3dffc0224b426dd8;p=ghc-hetmet.git

diff --git a/docs/users_guide/runtime_control.xml b/docs/users_guide/runtime_control.xml
index d8735e2..2b16234 100644
--- a/docs/users_guide/runtime_control.xml
+++ b/docs/users_guide/runtime_control.xml
@@ -182,7 +182,7 @@
           <indexterm><primary>allocation area, size</primary></indexterm>
         </term>
 	<listitem>
-	  <para>&lsqb;Default: 256k&rsqb; Set the allocation area size
+	  <para>&lsqb;Default: 512k&rsqb; Set the allocation area size
           used by the garbage collector.  The allocation area
           (actually generation 0 step 0) is fixed and is never resized
           (unless you use <option>-H</option>, below).</para>
@@ -298,51 +298,58 @@
 
       <varlistentry>
         <term>
-          <option>-q1</option>
-          <indexterm><primary><option>-q1</option><secondary>RTS
+          <option>-qg<optional><replaceable>gen</replaceable></optional></option>
+          <indexterm><primary><option>-qg</option><secondary>RTS
           option</secondary></primary></indexterm>
         </term>
         <listitem>
-          <para>&lsqb;New in GHC 6.12.1&rsqb; Disable the parallel GC.
-            The parallel GC is turned on automatically when parallel
-            execution is enabled with the <option>-N</option> option;
-            this option is available to turn it off if
-            necessary.</para>
+          <para>&lsqb;New in GHC 6.12.1&rsqb; &lsqb;Default: 0&rsqb;
+            Use parallel GC in
+            generation <replaceable>gen</replaceable> and higher.
+            Omitting <replaceable>gen</replaceable> turns off the
+            parallel GC completely, reverting to sequential GC.</para>
           
-          <para>Experiments have shown that parallel GC usually
-            results in a performance improvement given 3 cores or
-            more; with 2 cores it may or may not be beneficial,
-            depending on the workload.  Bigger heaps work better with
-            parallel GC, so set your <option>-H</option> value high (3
-            or more times the maximum residency).  Look at the timing
-            stats with <option>+RTS -s</option> to see whether you're
-            getting any benefit from parallel GC or not.  If you find
-            parallel GC is significantly <emphasis>slower</emphasis>
-            (in elapsed time) than sequential GC, please report it as
-            a bug.</para>
-
-          <para>In GHC 6.10.1 it was possible to use a different
-            number of threads for GC than for execution, because the GC
-            used its own pool of threads.  Now, the GC uses the same
-            threads as the mutator (for executing the program).</para>
+          <para>The default parallel GC settings are usually suitable
+            for parallel programs (i.e. those
+            using <literal>par</literal>, Strategies, or with multiple
+            threads).  However, it is sometimes beneficial to enable
+            the parallel GC for a single-threaded sequential program
+            too, especially if the program has a large amount of heap
+            data and GC is a significant fraction of runtime.  To use
+            the parallel GC in a sequential program, enable the
+            parallel runtime with a suitable <literal>-N</literal>
+            option, and additionally it might be beneficial to
+            restrict parallel GC to the old generation
+            with <literal>-qg1</literal>.</para>
         </listitem>
       </varlistentry>        
 
       <varlistentry>
         <term>
-          <option>-qg<replaceable>n</replaceable></option>
-          <indexterm><primary><option>-qg</option><secondary>RTS
+          <option>-qb<optional><replaceable>gen</replaceable></optional></option>
+          <indexterm><primary><option>-qb</option><secondary>RTS
           option</secondary></primary></indexterm>
         </term>
         <listitem>
           <para>
-            &lsqb;Default: 1&rsqb; &lsqb;New in GHC 6.12.1&rsqb;
-            Enable the parallel GC only in
-            generation <replaceable>n</replaceable> and greater.
-            Parallel GC is often not worthwhile for collections in
-            generation 0 (the young generation), so it is enabled by
-            default only for collections in generation 1 (and higher,
-            if applicable).
+            &lsqb;New in GHC 6.12.1&rsqb; &lsqb;Default: 1&rsqb; Use
+            load-balancing in the parallel GC in
+            generation <replaceable>gen</replaceable> and higher.
+            Omitting <replaceable>gen</replaceable> disables
+            load-balancing entirely.</para>
+          
+          <para>
+            Load-balancing shares out the work of GC between the
+            available cores.  This is a good idea when the heap is
+            large and we need to parallelise the GC work, however it
+            is also pessimal for the short young-generation
+            collections in a parallel program, because it can harm
+            locality by moving data from the cache of the CPU where is
+            it being used to the cache of another CPU.  Hence the
+            default is to do load-balancing only in the
+            old-generation.  In fact, for a parallel program it is
+            sometimes beneficial to disable load-balancing entirely
+            with <literal>-qb</literal>.
           </para>
         </listitem>
       </varlistentry>
@@ -529,21 +536,25 @@
     <itemizedlist>
       <listitem>
         <para>
-          The total bytes allocated by the program. This may be less
-          than the peak memory use, as some may be freed. 
+          The total number of bytes allocated by the program over the
+          whole run.
         </para>
       </listitem>
       <listitem>
         <para>
-          The total number of garbage collections that occurred.
+          The total number of garbage collections performed.
         </para>
       </listitem>
       <listitem>
         <para>
-          The average and maximum space used by your program.
-          This is only checked during major garbage collections, so it
-          is only an approximation; the number of samples tells you how
-          many times it is checked.
+          The average and maximum "residency", which is the amount of
+          live data in bytes.  The runtime can only determine the
+          amount of live data during a major GC, which is why the
+          number of samples corresponds to the number of major GCs
+          (and is usually relatively small).  To get a better picture
+          of the heap profile of your program, use
+          the <option>-hT</option> RTS option
+          (<xref linkend="rts-profiling" />).
         </para>
       </listitem>
       <listitem>
@@ -618,14 +629,14 @@
       <listitem>
         <para>
         The "bytes allocated in the heap" is the total bytes allocated
-        by the program. This may be less than the peak memory use, as
-        some may be freed.
+        by the program over the whole run.
         </para>
       </listitem>
       <listitem>
         <para>
-        GHC uses a copying garbage collector. "bytes copied during GC" 
-        tells you how many bytes it had to copy during garbage collection.
+        GHC uses a copying garbage collector by default. "bytes copied
+        during GC" tells you how many bytes it had to copy during
+        garbage collection.
         </para>
       </listitem>
       <listitem>
@@ -639,7 +650,10 @@
       <listitem>
         <para>
         The "bytes maximum slop" tells you the most space that is ever
-        wasted due to the way GHC packs data into so-called "megablocks".
+        wasted due to the way GHC allocates memory in blocks.  Slop is
+        memory at the end of a block that was wasted.  There's no way
+        to control this; we just like to see how much memory is being
+        lost this way.
         </para>
       </listitem>
       <listitem>
@@ -652,7 +666,7 @@
         <para>
         Next there is information about the garbage collections done.
         For each generation it says how many garbage collections were
-        done, how many of those collections used multiple threads,
+        done, how many of those collections were done in parallel,
         the total CPU time used for garbage collecting that generation,
         and the total wall clock time elapsed while garbage collecting
         that generation.
@@ -671,8 +685,8 @@
       </listitem>
       <listitem>
         <para>
-        Next there is the CPU time and wall clock time elapsedm broken
-        down by what the runtiem system was doing at the time.
+        Next there is the CPU time and wall clock time elapsed broken
+        down by what the runtime system was doing at the time.
         INIT is the runtime system initialisation.
         MUT is the mutator time, i.e. the time spent actually running
         your code.
@@ -803,6 +817,92 @@
     </variablelist>
   </sect2>
 
+  <sect2 id="rts-eventlog">
+    <title>Tracing</title>
+
+    <indexterm><primary>tracing</primary></indexterm>
+    <indexterm><primary>events</primary></indexterm>
+    <indexterm><primary>eventlog files</primary></indexterm>
+
+    <para>
+      When the program is linked with the <option>-eventlog</option>
+      option (<xref linkend="options-linker" />), runtime events can
+      be logged in two ways:
+    </para>
+
+    <itemizedlist>
+      <listitem>
+        <para>
+          In binary format to a file for later analysis by a
+          variety of tools.  One such tool
+          is <ulink url="http://hackage.haskell.org/package/ThreadScope">ThreadScope</ulink><indexterm><primary>ThreadScope</primary></indexterm>,
+          which interprets the event log to produce a visual parallel
+          execution profile of the program.
+        </para>
+      </listitem>
+      <listitem>
+        <para>
+          As text to standard output, for debugging purposes.
+        </para>
+      </listitem>
+    </itemizedlist>
+
+    <variablelist>
+      <varlistentry>
+        <term>
+          <option>-l<optional><replaceable>type</replaceable></optional></option>
+          <indexterm><primary><option>-l</option></primary><secondary>RTS option</secondary></indexterm>
+        </term>
+        <listitem>
+          <para>
+            Log events in binary format to the
+            file <filename><replaceable>program</replaceable>.eventlog</filename>,
+            where <replaceable>type</replaceable> indicates the type
+            of events to log.  Currently there is only one type
+            supported: <literal>-ls</literal>, for scheduler events.
+          </para>
+
+          <para>
+            The format of the log file is described by the header
+            <filename>EventLogFormat.h</filename> that comes with
+            GHC, and it can be parsed in Haskell using
+            the <ulink url="http://hackage.haskell.org/package/ghc-events">ghc-events</ulink>
+            library.  To dump the contents of
+            a <literal>.eventlog</literal> file as text, use the
+            tool <literal>show-ghc-events</literal> that comes with
+            the <ulink url="http://hackage.haskell.org/package/ghc-events">ghc-events</ulink>
+            package.
+          </para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>
+          <option>-v</option>
+          <indexterm><primary><option>-v</option></primary><secondary>RTS option</secondary></indexterm>
+        </term>
+        <listitem>
+          <para>
+            Log events as text to standard output, instead of to
+            the <literal>.eventlog</literal> file.
+          </para>
+        </listitem>
+      </varlistentry>
+
+    </variablelist>
+
+    <para>
+      The debugging
+      options <option>-D<replaceable>x</replaceable></option> also
+      generate events which are logged using the tracing framework.
+      By default those events are dumped as text to stdout
+      (<option>-D<replaceable>x</replaceable></option>
+      implies <option>-v</option>), but they may instead be stored in
+      the binary eventlog file by using the <option>-l</option>
+      option.
+    </para>
+  </sect2>
+
   <sect2 id="rts-options-debugging">
     <title>RTS options for hackers, debuggers, and over-interested
     souls</title>
@@ -839,14 +939,28 @@
 
       <varlistentry>
 	<term>
-          <option>-D</option><replaceable>num</replaceable>
+          <option>-D</option><replaceable>x</replaceable>
           <indexterm><primary>-D</primary><secondary>RTS option</secondary></indexterm>
         </term>
 	<listitem>
-	  <para>An RTS debugging flag; varying quantities of output
-          depending on which bits are set in
-          <replaceable>num</replaceable>.  Only works if the RTS was
-          compiled with the <option>DEBUG</option> option.</para>
+	  <para>
+            An RTS debugging flag; only availble if the program was
+	    linked with the <option>-debug</option> option.  Various
+	    values of <replaceable>x</replaceable> are provided to
+	    enable debug messages and additional runtime sanity checks
+	    in different subsystems in the RTS, for
+	    example <literal>+RTS -Ds -RTS</literal> enables debug
+	    messages from the scheduler.
+	    Use <literal>+RTS&nbsp;-?</literal> to find out which
+	    debug flags are supported.
+          </para>
+
+          <para>
+            Debug messages will be sent to the binary event log file
+            instead of stdout if the <option>-l</option> option is
+            added.  This might be useful for reducing the overhead of
+            debug tracing.
+          </para>
 	</listitem>
       </varlistentry>
 
@@ -1020,12 +1134,22 @@ char *ghc_rts_opts = "-H128m -K1m";
     itself. To do this, use the <option>--info</option> flag, e.g.</para>
 <screen>
 $ ./a.out +RTS --info
- [("GHC RTS", "Yes")
+ [("GHC RTS", "YES")
  ,("GHC version", "6.7")
  ,("RTS way", "rts_p")
  ,("Host platform", "x86_64-unknown-linux")
+ ,("Host architecture", "x86_64")
+ ,("Host OS", "linux")
+ ,("Host vendor", "unknown")
  ,("Build platform", "x86_64-unknown-linux")
+ ,("Build architecture", "x86_64")
+ ,("Build OS", "linux")
+ ,("Build vendor", "unknown")
  ,("Target platform", "x86_64-unknown-linux")
+ ,("Target architecture", "x86_64")
+ ,("Target OS", "linux")
+ ,("Target vendor", "unknown")
+ ,("Word size", "64")
  ,("Compiler unregisterised", "NO")
  ,("Tables next to code", "YES")
  ]
@@ -1039,8 +1163,8 @@ $ ./a.out +RTS --info
       <varlistentry>
         <term><literal>GHC RTS</literal></term>
         <listitem>
-          <para>Is this program linked against the GHC RTS? (Currently
-          the answer is always yes.)</para>
+          <para>Is this program linked against the GHC RTS? (always
+          "YES").</para>
         </listitem>
       </varlistentry>
 
@@ -1054,45 +1178,71 @@ $ ./a.out +RTS --info
       <varlistentry>
         <term><literal>RTS way</literal></term>
         <listitem>
-          <para>The variant (&ldquo;way&rdquo;) of the runtime. Possible
-          values are <literal>rts</literal> (vanilla), 
+          <para>The variant (&ldquo;way&rdquo;) of the runtime. The
+          most common values are <literal>rts</literal> (vanilla),
           <literal>rts_thr</literal> (threaded runtime, i.e. linked using the
           <literal>-threaded</literal> option) and <literal>rts_p</literal>
           (profiling runtime, i.e. linked using the <literal>-prof</literal>
-          option). Other variants include <literal>t</literal>
-          (ticky-ticky profiling) and <literal>dyn</literal> (the RTS is
+          option). Other variants include <literal>debug</literal>
+          (linked using <literal>-debug</literal>),
+          <literal>t</literal> (ticky-ticky profiling) and
+          <literal>dyn</literal> (the RTS is
           linked in dynamically, i.e. a shared library, rather than statically
-          linked into the executable itself).</para>
+          linked into the executable itself). These can be combined,
+          e.g. you might have <literal>rts_thr_debug_p</literal>.</para>
         </listitem>
       </varlistentry>
 
       <varlistentry>
-        <term><literal>Target platform</literal></term>
+        <term>
+            <literal>Target platform</literal>,
+            <literal>Target architecture</literal>,
+            <literal>Target OS</literal>,
+            <literal>Target vendor</literal>
+        </term>
         <listitem>
-          <para>This is the platform the program is compiled to run on.</para>
+          <para>These are the platform the program is compiled to run on.</para>
         </listitem>
       </varlistentry>
 
       <varlistentry>
-        <term><literal>Build platform</literal></term>
+        <term>
+            <literal>Build platform</literal>,
+            <literal>Build architecture</literal>,
+            <literal>Build OS</literal>,
+            <literal>Build vendor</literal>
+        </term>
         <listitem>
-          <para>This is the platform where the program was compiled
-          from. (That is, the target platform of GHC itself.) Ordinarily
+          <para>These are the platform where the program was built
+          on. (That is, the target platform of GHC itself.) Ordinarily
           this is identical to the target platform. (It could potentially
           be different if cross-compiling.)</para>
         </listitem>
       </varlistentry>
 
       <varlistentry>
-        <term><literal>Host platform</literal></term>
+        <term>
+            <literal>Host platform</literal>,
+            <literal>Host architecture</literal>
+            <literal>Host OS</literal>
+            <literal>Host vendor</literal>
+        </term>
         <listitem>
-          <para>This is the platform where GHC itself was compiled.
+          <para>These are the platform where GHC itself was compiled.
           Again, this would normally be identical to the build and
           target platforms.</para>
         </listitem>
       </varlistentry>
 
       <varlistentry>
+        <term><literal>Word size</literal></term>
+        <listitem>
+          <para>Either <literal>"32"</literal> or <literal>"64"</literal>,
+          reflecting the word size of the target platform.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
         <term><literal>Compiler unregistered</literal></term>
         <listitem>
           <para>Was this program compiled with an &ldquo;unregistered&rdquo;