X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fusers_guide%2Fprofiling.sgml;h=e79e63824af5a1fb52504c45f3c8e6e8c43fa3f8;hb=57d3c7c8a8523e375d49d2ac036210b6dcf7ec9c;hp=30175aed34d08c7315145a30fe1a80a0630f1376;hpb=66e87ae1ac00d54df5024033fda5d08db99177a4;p=ghc-hetmet.git

diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml
index 30175ae..e79e638 100644
--- a/ghc/docs/users_guide/profiling.sgml
+++ b/ghc/docs/users_guide/profiling.sgml
@@ -4,13 +4,13 @@
   </indexterm>
   <indexterm><primary>cost-centre profiling</primary></indexterm>
 
-  <Para> Glasgow Haskell comes with a time and space profiling
+  <para> Glasgow Haskell comes with a time and space profiling
   system. Its purpose is to help you improve your understanding of
-  your program's execution behaviour, so you can improve it.</Para>
+  your program's execution behaviour, so you can improve it.</para>
   
-  <Para> Any comments, suggestions and/or improvements you have are
+  <para> Any comments, suggestions and/or improvements you have are
   welcome.  Recommended &ldquo;profiling tricks&rdquo; would be
-  especially cool! </Para>
+  especially cool! </para>
 
   <para>Profiling a program is a three-step process:</para>
 
@@ -30,12 +30,10 @@
     </listitem>
 
     <listitem>
-      <para> Run your program with one of the profiling options
-      <literal>-p</literal> or <literal>-h</literal>.  This generates
-      a file of profiling information.</para>
-      <indexterm><primary><literal>-p</literal></primary><secondary>RTS
-      option</secondary></indexterm>
-      <indexterm><primary><literal>-h</literal></primary><secondary>RTS
+      <para> Run your program with one of the profiling options, eg.
+      <literal>+RTS -p -RTS</literal>.  This generates a file of
+      profiling information.</para>
+      <indexterm><primary><option>-p</option></primary><secondary>RTS
       option</secondary></indexterm>
     </listitem>
       
@@ -47,7 +45,7 @@
     
   </orderedlist>
   
-  <sect1>
+  <sect1 id="cost-centres">
     <title>Cost centres and cost-centre stacks</title>
     
     <para>GHC's profiling system assigns <firstterm>costs</firstterm>
@@ -81,7 +79,7 @@ $
     will contain something like this:</para>
 
 <screen>
-       Tue Apr 18 12:52 2000 Time and Allocation Profiling Report  (Final)
+          Fri May 12 14:06 2000 Time and Allocation Profiling Report  (Final)
 
            Main +RTS -p -RTS
 
@@ -93,15 +91,16 @@ COST CENTRE          MODULE     %time %alloc
 nfib                 Main       100.0  100.0
 
 
-COST CENTRE              MODULE         scc %time %alloc    inner  cafs
+                                              individual     inherited
+COST CENTRE              MODULE      entries %time %alloc   %time %alloc
 
-MAIN                     MAIN             0   0.0   0.0        0     1
- main                    Main             0   0.0   0.0        0     1
- CAF 			 PrelHandle       3   0.0   0.0        0     3
- CAF 			 PrelAddr         1   0.0   0.0        0     0
- CAF 			 Main             6   0.0   0.0        1     0
-  main                   Main             1   0.0   0.0        1     1
-   nfib                  Main        242785 100.0 100.0   242784     4
+MAIN                     MAIN             0    0.0   0.0    100.0 100.0
+ main                    Main             0    0.0   0.0      0.0   0.0
+ CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
+ CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
+ CAF                     Main             6    0.0   0.0    100.0 100.0
+  main                   Main             1    0.0   0.0    100.0 100.0
+   nfib                  Main        242785  100.0 100.0    100.0 100.0
 </screen>
 
 
@@ -125,6 +124,12 @@ MAIN                     MAIN             0   0.0   0.0        0     1
     the costly call to <function>nfib</function> came from
     <function>main</function>.</para>
 
+    <para>The time and allocation incurred by a given part of the
+    program is displayed in two ways: &ldquo;individual&rdquo;, which
+    are the costs incurred by the code covered by this cost centre
+    stack alone, and &ldquo;inherited&rdquo;, which includes the costs
+    incurred by all the children of this node.</para>
+
     <para>The usefulness of cost-centre stacks is better demonstrated
     by  modifying the example slightly:</para>
 
@@ -139,18 +144,18 @@ nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
     the new profiling results:</para>
 
 <screen>
-COST CENTRE              MODULE         scc %time %alloc    inner  cafs
-
-MAIN                     MAIN             0   0.0   0.0        0     1
- main                    Main             0   0.0   0.0        0     1
- CAF                     PrelHandle       3   0.0   0.0        0     3
- CAF                     PrelAddr         1   0.0   0.0        0     0
- CAF             	 Main             9   0.0   0.0        1     1
-  main                   Main             1   0.0   0.0        2     2
-   g                     Main             1   0.0   0.0        1     3
-    nfib                 Main           465   0.0   0.2      464     0
-   f                     Main             1   0.0   0.0        1     1
-    nfib                 Main        242785 100.0  99.8   242784     1
+COST CENTRE              MODULE         scc  %time %alloc   %time %alloc
+
+MAIN                     MAIN             0    0.0   0.0    100.0 100.0
+ main                    Main             0    0.0   0.0      0.0   0.0
+ CAF                     PrelHandle       3    0.0   0.0      0.0   0.0
+ CAF                     PrelAddr         1    0.0   0.0      0.0   0.0
+ CAF                     Main             9    0.0   0.0    100.0 100.0
+  main                   Main             1    0.0   0.0    100.0 100.0
+   g                     Main             1    0.0   0.0      0.0   0.2
+    nfib                 Main           465    0.0   0.2      0.0   0.2
+   f                     Main             1    0.0   0.0    100.0  99.8
+    nfib                 Main        242785  100.0  99.8    100.0  99.8
 </screen>
 
     <para>Now although we had two calls to <function>nfib</function>
@@ -161,7 +166,7 @@ MAIN                     MAIN             0   0.0   0.0        0     1
 
     <variablelist>
       <varlistentry>
-	<term>scc</term>
+	<term>entries</term>
 	<listitem>
 	  <para>The number of times this particular point in the call
 	  graph was entered.</para>
@@ -169,7 +174,7 @@ MAIN                     MAIN             0   0.0   0.0        0     1
       </varlistentry>
 
       <varlistentry>
-	<term>&percnt;time</term>
+	<term>individual &percnt;time</term>
 	<listitem>
 	  <para>The percentage of the total run time of the program
 	  spent at this point in the call graph.</para>
@@ -177,7 +182,7 @@ MAIN                     MAIN             0   0.0   0.0        0     1
       </varlistentry>
 
       <varlistentry>
-	<term>&percnt;alloc</term>
+	<term>individual &percnt;alloc</term>
 	<listitem>
 	  <para>The percentage of the total memory allocations
 	  (excluding profiling overheads) of the program made by this
@@ -186,19 +191,19 @@ MAIN                     MAIN             0   0.0   0.0        0     1
       </varlistentry>
 
       <varlistentry>
-	<term>inner</term>
+	<term>inherited &percnt;time</term>
 	<listitem>
-	  <para>The number of times an inner call-graph context was
-	  entered from here (including recursive calls).</para>
+	  <para>The percentage of the total run time of the program
+	  spent below this point in the call graph.</para>
 	</listitem>
       </varlistentry>
 
       <varlistentry>
-	<term>cafs</term>
+	<term>inherited &percnt;alloc</term>
 	<listitem>
-	  <para>The number of times a CAF context was entered from
-	  here.  CAFs are described in <xref
-	  linkend="prof-rules">.</para>
+	  <para>The percentage of the total memory allocations
+	  (excluding profiling overheads) of the program made by this
+	  call and all of its sub-calls.</para>
 	</listitem>
       </varlistentry>
     </variablelist>
@@ -211,20 +216,20 @@ MAIN                     MAIN             0   0.0   0.0        0     1
       <varlistentry>
 	<term><literal>ticks</literal></term>
 	<listitem>
-	  <Para>The raw number of time &ldquo;ticks&rdquo; which were
+	  <para>The raw number of time &ldquo;ticks&rdquo; which were
           attributed to this cost-centre; from this, we get the
           <literal>&percnt;time</literal> figure mentioned
-          above.</Para>
+          above.</para>
 	</listitem>
       </varlistentry>
 
       <varlistentry>
 	<term><literal>bytes</literal></term>
 	<listItem>
-	  <Para>Number of bytes allocated in the heap while in this
+	  <para>Number of bytes allocated in the heap while in this
           cost-centre; again, this is the raw number from which we get
           the <literal>&percnt;alloc</literal> figure mentioned
-          above.</Para>
+          above.</para>
 	</listItem>
       </varListEntry>
     </variablelist>
@@ -249,14 +254,14 @@ MAIN                     MAIN             0   0.0   0.0        0     1
       <para>The syntax of a cost centre annotation is</para>
 
       <programlisting>
-     _scc_ "name" &lt;expression&gt;
+     {-# SCC "name" #-} &lt;expression&gt;
 </programlisting>
 
       <para>where <literal>"name"</literal> is an aribrary string,
       that will become the name of your cost centre as it appears
       in the profiling output, and
       <literal>&lt;expression&gt;</literal> is any Haskell
-      expression.  An <literal>_scc_</literal> annotation extends as
+      expression.  An <literal>SCC</literal> annotation extends as
       far to the right as possible when parsing.</para>
 
     </sect2>
@@ -272,14 +277,14 @@ MAIN                     MAIN             0   0.0   0.0        0     1
 	  <para>If the expression is part of the
 	  <firstterm>one-off</firstterm> costs of evaluating the
 	  enclosing top-level definition, then costs are attributed to
-	  the stack of lexically enclosing <literal>_scc_</literal>
+	  the stack of lexically enclosing <literal>SCC</literal>
 	  annotations on top of the special <literal>CAF</literal>
 	  cost-centre. </para>
 	</listitem>
 
 	<listitem>
 	  <para>Otherwise, costs are attributed to the stack of
-	  lexically-enclosing <literal>_scc_</literal> annotations,
+	  lexically-enclosing <literal>SCC</literal> annotations,
 	  appended to the cost-centre stack in effect at the
 	  <firstterm>call site</firstterm> of the current top-level
 	  definition<footnote> <para>The call-site is just the place
@@ -287,6 +292,12 @@ MAIN                     MAIN             0   0.0   0.0        0     1
 	  variable.</para></footnote>.  Notice that this is a recursive
 	  definition.</para>
 	</listitem>
+
+	<listitem>
+	  <para>Time spent in foreign code (see <xref linkend="ffi">)
+	  is always attributed to the cost centre in force at the
+	  Haskell call-site of the foreign function.</para>
+	</listitem>
       </itemizedlist>
 
       <para>What do we mean by one-off costs?  Well, Haskell is a lazy
@@ -322,143 +333,80 @@ x = nfib 25
       doesn't look like you expect it to, feel free to send it (and
       your program) to us at
       <email>glasgow-haskell-bugs@haskell.org</email>.</para>
-
     </sect2>
   </sect1>
 
-  <sect1 id="prof-heap">
-    <title>Profiling memory usage</title>
-
-    <para>In addition to profiling the time and allocation behaviour
-    of your program, you can also generate a graph of its memory usage
-    over time.  This is useful for detecting the causes of
-    <firstterm>space leaks</firstterm>, when your program holds on to
-    more memory at run-time that it needs to.  Space leaks lead to
-    longer run-times due to heavy garbage collector ativity, and may
-    even cause the program to run out of memory altogether.</para>
-
-    <para>To generate a heap profile from your program, compile it as
-    before, but this time run it with the <option>-h</option> runtime
-    option.  This generates a file
-    <filename>&lt;prog&gt;.hp</filename> file, which you then process
-    with <command>hp2ps</command> to produce a Postscript file
-    <filename>&lt;prog&gt;.ps</filename>.  The Postscript file can be
-    viewed with something like <command>ghostview</command>, or
-    printed out on a Postscript-compatible printer.</para>
-
-    <para>For the RTS options that control the kind of heap profile
-    generated, see <xref linkend="prof-rts-options">.  Details on the
-    usage of the <command>hp2ps</command> program are given in <xref
-    linkend="hp2ps"></para>
-
-  </sect1>
-
-  <sect1 id="prof-xml-tool">
-    <title>Graphical time/allocation profile</title>
-
-    <para>You can view the time and allocation profiling graph of your
-    program graphically, using <command>ghcprof</command>.  This is a
-    new tool with GHC 4.07, and will eventually be the de-facto
-    standard way of viewing GHC profiles.</para>
-
-    <para>To run <command>ghcprof</command>, you need
-    <productname>daVinci</productname> installed, which can be
-    obtained from <ulink
-    url="http://www.tzi.de/~davinci/"><citetitle>The Graph
-    Visualisation Tool daVinci</citetitle></ulink>.  Install one of
-    the binary
-    distributions<footnote><para><productname>daVinci</productname> is
-    sadly not open-source :-(.</para></footnote>, and set your
-    <envar>DAVINCIHOME</envar> environment variable to point to the
-    installation directory.</para>
-
-    <para><command>ghcprof</command> uses an XML-based profiling log
-    format, and you therefore need to run your program with a
-    different option: <option>-px</option>.  The file generated is
-    still called <filename>&lt;prog&gt;.prof</filename>.  To see the
-    profile, run <command>ghcprof</command> like this:</para>
-
-    <indexterm><primary><option>-px</option></primary></indexterm>
-
-<screen>
-$ ghcprof &lt;prog&gt;.prof
-</screen>
-
-    <para>which should pop up a window showing the call-graph of your
-    program in glorious detail.  More information on using
-    <command>ghcprof</command> can be found at <ulink
-    url="http://www.dcs.warwick.ac.uk/people/academic/Stephen.Jarvis/profiler/index.html"><citetitle>The
-    Cost-Centre Stack Profiling Tool for
-    GHC</citetitle></ulink>.</para>
-
-  </sect1>
-
   <sect1 id="prof-compiler-options">
     <title>Compiler options for profiling</title>
 
     <indexterm><primary>profiling</primary><secondary>options</secondary></indexterm>
     <indexterm><primary>options</primary><secondary>for profiling</secondary></indexterm>
 
-    <Para> To make use of the cost centre profiling system
-    <Emphasis>all</Emphasis> modules must be compiled and linked with
-    the <Option>-prof</Option> option. Any
-    <Function>&lowbar;scc&lowbar;</Function> constructs you've put in
-    your source will spring to life.</Para> 
-
-    <indexterm><primary><literal>-prof</literal></primary></indexterm>
-
-    <Para> Without a <Option>-prof</Option> option, your
-    <Function>&lowbar;scc&lowbar;</Function>s are ignored; so you can
-    compiled <Function>&lowbar;scc&lowbar;</Function>-laden code
-    without changing it.</Para>
-    
-    <Para>There are a few other profiling-related compilation options.
-    Use them <Emphasis>in addition to</Emphasis>
-    <Option>-prof</Option>.  These do not have to be used consistently
-    for all modules in a program.</Para>
-
     <variableList>
-
       <varListEntry>
-	<term><Option>-auto</Option>:</Term>
-	<indexterm><primary><literal>-auto</literal></primary></indexterm>
+	<term><Option>-prof</Option>:</Term>
+	<indexterm><primary><option>-prof</option></primary></indexterm>
+	<listItem>
+	  <para> To make use of the profiling system
+          <emphasis>all</emphasis> modules must be compiled and linked
+          with the <option>-prof</option> option. Any
+          <literal>SCC</literal> annotations you've put in your source
+          will spring to life.</para>
+
+	  <para> Without a <option>-prof</option> option, your
+          <literal>SCC</literal>s are ignored; so you can compile
+          <literal>SCC</literal>-laden code without changing
+          it.</para>
+	</listItem>
+      </varListEntry>
+    </variablelist>
+      
+    <para>There are a few other profiling-related compilation options.
+    Use them <emphasis>in addition to</emphasis>
+    <option>-prof</option>.  These do not have to be used consistently
+    for all modules in a program.</para>
+
+    <variablelist>
+      <varlistentry>
+	<term><option>-auto</option>:</Term>
+	<indexterm><primary><option>-auto</option></primary></indexterm>
 	<indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
 	<listItem>
-	  <Para> GHC will automatically add
+	  <para> GHC will automatically add
           <Function>&lowbar;scc&lowbar;</Function> constructs for all
-          top-level, exported functions.</Para>
+          top-level, exported functions.</para>
 	</listItem>
       </varListEntry>
       
       <varListEntry>
-	<term><Option>-auto-all</Option>:</Term>
-	<indexterm><primary><literal>-auto-all</literal></primary></indexterm>
+	<term><option>-auto-all</option>:</Term>
+	<indexterm><primary><option>-auto-all</option></primary></indexterm>
 	<listItem>
-	  <Para> <Emphasis>All</Emphasis> top-level functions,
+	  <para> <Emphasis>All</Emphasis> top-level functions,
 	  exported or not, will be automatically
-	  <Function>&lowbar;scc&lowbar;</Function>'d.</Para>
+	  <Function>&lowbar;scc&lowbar;</Function>'d.</para>
 	</listItem>
       </varListEntry>
 
       <varListEntry>
-	<term><Option>-caf-all</Option>:</Term>
-	<indexterm><primary><literal>-caf-all</literal></primary></indexterm>
+	<term><option>-caf-all</option>:</Term>
+	<indexterm><primary><option>-caf-all</option></primary></indexterm>
 	<listItem>
-	  <Para> The costs of all CAFs in a module are usually
+	  <para> The costs of all CAFs in a module are usually
 	  attributed to one &ldquo;big&rdquo; CAF cost-centre. With
 	  this option, all CAFs get their own cost-centre.  An
-	  &ldquo;if all else fails&rdquo; option&hellip;</Para>
+	  &ldquo;if all else fails&rdquo; option&hellip;</para>
 	</listItem>
       </varListEntry>
 
       <varListEntry>
-	<term><Option>-ignore-scc</Option>:</Term>
-	<indexterm><primary><literal>-ignore-scc</literal></primary></indexterm>
+	<term><option>-ignore-scc</option>:</Term>
+	<indexterm><primary><option>-ignore-scc</option></primary></indexterm>
 	<listItem>
-	  <Para>Ignore any <Function>&lowbar;scc&lowbar;</Function>
+	  <para>Ignore any <Function>&lowbar;scc&lowbar;</Function>
           constructs, so a module which already has
           <Function>&lowbar;scc&lowbar;</Function>s can be compiled
-          for profiling with the annotations ignored.</Para>
+          for profiling with the annotations ignored.</para>
 	</listItem>
       </varListEntry>
 
@@ -466,47 +414,29 @@ $ ghcprof &lt;prog&gt;.prof
 
   </sect1>
 
-  <sect1 id="prof-rts-options">
-    <title>Runtime options for profiling</Title>
-
-    <indexterm><primary>profiling RTS options</primary></indexterm>
-    <indexterm><primary>RTS options, for profiling</primary></indexterm>
+  <sect1 id="prof-time-options">
+    <title>Time and allocation profiling</Title>
 
-    <Para>It isn't enough to compile your program for profiling with
-    <Option>-prof</Option>!</Para>
-
-    <Para>When you <Emphasis>run</Emphasis> your profiled program, you
-    must tell the runtime system (RTS) what you want to profile (e.g.,
-    time and/or space), and how you wish the collected data to be
-    reported.  You also may wish to set the sampling interval used in
-    time profiling.</Para>
-
-    <Para>Executive summary: <command>./a.out +RTS -pT</command>
-    produces a time profile in <Filename>a.out.prof</Filename>;
-    <command>./a.out +RTS -hC</command> produces space-profiling info
-    which can be mangled by <command>hp2ps</command> and viewed with
-    <command>ghostview</command> (or equivalent).</Para>
-
-    <Para>Profiling runtime flags are passed to your program between
-    the usual <Option>+RTS</Option> and <Option>-RTS</Option>
-    options.</Para>
+    <para>To generate a time and allocation profile, give one of the
+    following RTS options to the compiled program when you run it (RTS
+    options should be enclosed between <literal>+RTS...-RTS</literal>
+    as usual):</para>
 
     <variableList>
-      
       <varListEntry>
 	<term><Option>-p</Option> or <Option>-P</Option>:</Term>
 	<indexterm><primary><option>-p</option></primary></indexterm>
 	<indexterm><primary><option>-P</option></primary></indexterm>
 	<indexterm><primary>time profile</primary></indexterm>
 	<listItem>
-	  <Para>The <Option>-p</Option> option produces a standard
+	  <para>The <Option>-p</Option> option produces a standard
           <Emphasis>time profile</Emphasis> report.  It is written
           into the file
-          <Filename>&lt;program&gt;.prof</Filename>.</Para>
+          <Filename><replaceable>program</replaceable>.prof</Filename>.</para>
 
-	  <Para>The <Option>-P</Option> option produces a more
+	  <para>The <Option>-P</Option> option produces a more
           detailed report containing the actual time and allocation
-          data as well.  (Not used much.)</Para>
+          data as well.  (Not used much.)</para>
 	</listitem>
       </varlistentry>
 
@@ -521,89 +451,471 @@ $ ghcprof &lt;prog&gt;.prof
       </varlistentry>
 
       <varlistentry>
-	<term><Option>-i&lt;secs&gt;</Option>:</Term>
-	<indexterm><primary><option>-i</option></primary></indexterm>
-	<listItem>
-	  <Para> Set the profiling (sampling) interval to
-          <literal>&lt;secs&gt;</literal> seconds (the default is
-          1&nbsp;second).  Fractions are allowed: for example
-          <Option>-i0.2</Option> will get 5 samples per second.  This
-          only affects heap profiling; time profiles are always
-          sampled on a 1/50 second frequency.</Para>
-	</listItem>
+	<term><option>-xc</option></term>
+	<indexterm><primary><option>-xc</option></primary><secondary>RTS
+	option</secondary></indexterm>
+	<listitem>
+	  <para>This option makes use of the extra information
+	  maintained by the cost-centre-stack profiler to provide
+	  useful information about the location of runtime errors.
+	  See <xref linkend="rts-options-debugging">.</para>
+	</listitem>
       </varlistentry>
 
-      <varlistentry>
-	<term><Option>-h&lt;break-down&gt;</Option>:</Term>
-	<indexterm><primary><option>-h&lt;break-down&gt</option></primary></indexterm>
-	<indexterm><primary>heap profile</primary></indexterm>
-	<listItem>
-	  <Para>Produce a detailed <Emphasis>heap profile</Emphasis>
-          of the heap occupied by live closures. The profile is
-          written to the file <Filename>&lt;program&gt;.hp</Filename>
-          from which a PostScript graph can be produced using
-          <command>hp2ps</command> (see <XRef
-          LinkEnd="hp2ps">).</Para>
-
-	  <Para>The heap space profile may be broken down by different
-	  criteria:</para>
-
-	  <variableList>
-
-	    <varListEntry>
-	      <term><Option>-hC</Option>:</Term>
-	      <listItem>
-		<Para>cost centre which produced the closure (the
-		default).</Para>
-	      </listItem>
-	    </varListEntry>
-
-	    <varListEntry>
-	      <term><Option>-hM</Option>:</Term>
-	      <listItem>
-		<Para>cost centre module which produced the
-		closure.</Para>
-	      </listItem>
-	    </varListEntry>
-
-	    <varListEntry>
-	      <term><Option>-hD</Option>:</Term>
-	      <listItem>
-		<Para>closure description&mdash;a string describing
-		the closure.</Para>
-	      </listItem>
-	    </varListEntry>
-
-	    <varListEntry>
-	      <term><Option>-hY</Option>:</Term>
-	      <listItem>
-		<Para>closure type&mdash;a string describing the
-		closure's type.</Para>
-	      </listItem>
-	    </varListEntry>
-	  </variableList>
+    </variableList>
+    
+  </sect1>
 
-	</listItem>
-      </varListEntry>
+  <sect1 id="prof-heap">
+    <title>Profiling memory usage</title>
 
-      <varlistentry>
-	<term><option>-hx</option>:</term>
-	<indexterm><primary><option>-hx</option></primary></indexterm>
+    <para>In addition to profiling the time and allocation behaviour
+    of your program, you can also generate a graph of its memory usage
+    over time.  This is useful for detecting the causes of
+    <firstterm>space leaks</firstterm>, when your program holds on to
+    more memory at run-time that it needs to.  Space leaks lead to
+    longer run-times due to heavy garbage collector ativity, and may
+    even cause the program to run out of memory altogether.</para>
+
+    <para>To generate a heap profile from your program:</para>
+
+    <orderedlist>
+      <listitem>
+	<para>Compile the program for profiling (<xref
+	linkend="prof-compiler-options">).</para>
+      </listitem>
+      <listitem>
+	<para>Run it with one of the heap profiling options described
+	below (eg. <option>-hc</option> for a basic producer profile).
+	This generates the file
+	<filename><replaceable>prog</replaceable>.hp</filename>.</para>
+      </listitem>
+      <listitem>
+	<para>Run <command>hp2ps</command> to produce a Postscript
+	file,
+	<filename><replaceable>prog</replaceable>.ps</filename>.  The
+	<command>hp2ps</command> utility is described in detail in
+	<xref linkend="hp2ps">.</para> 
+      </listitem>
+      <listitem>
+	<para>Display the heap profile using a postscript viewer such
+	as <application>Ghostview</application>, or print it out on a
+	Postscript-capable printer.</para>
+      </listitem>
+    </orderedlist>
+
+    <sect2 id="rts-options-heap-prof">
+      <title>RTS options for heap profiling</title>
+
+      <para>There are several different kinds of heap profile that can
+      be generated.  All the different profile types yield a graph of
+      live heap against time, but they differ in how the live heap is
+      broken down into bands.  The following RTS options select which
+      break-down to use:</para>
+
+      <variablelist>
+	<varlistentry>
+	  <term><option>-hc</option></term>
+	  <indexterm><primary><option>-hc</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Breaks down the graph by the cost-centre stack which
+	    produced the data.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hm</option></term>
+	  <indexterm><primary><option>-hm</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Break down the live heap by the module containing
+	    the code which produced the data.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hd</option></term>
+	  <indexterm><primary><option>-hd</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Breaks down the graph by <firstterm>closure
+	    description</firstterm>.  For actual data, the description
+	    is just the constructor name, for other closures it is a
+	    compiler-generated string identifying the closure.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hy</option></term>
+	  <indexterm><primary><option>-hy</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Breaks down the graph by
+	    <firstterm>type</firstterm>.  For closures which have
+	    function type or unknown/polymorphic type, the string will
+	    represent an approximation to the actual type.</para>
+	  </listitem>
+	</varlistentry>
+	
+	<varlistentry>
+	  <term><option>-hr</option></term>
+	  <indexterm><primary><option>-hr</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Break down the graph by <firstterm>retainer
+	    set</firstterm>.  Retainer profiling is described in more
+	    detail below (<xref linkend="retainer-prof">).</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hb</option></term>
+	  <indexterm><primary><option>-hb</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Break down the graph by
+	    <firstterm>biography</firstterm>.  Biographical profiling
+	    is described in more detail below (<xref
+	    linkend="biography-prof">).</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <para>In addition, the profile can be restricted to heap data
+      which satisfies certain criteria - for example, you might want
+      to display a profile by type but only for data produced by a
+      certain module, or a profile by retainer for a certain type of
+      data.  Restrictions are specified as follows:</para>
+      
+      <variablelist>
+	<varlistentry>
+	  <term><option>-hc</option><replaceable>name</replaceable>,...</term>
+	  <indexterm><primary><option>-hc</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by
+	    cost-centre stacks with one of the specified cost centres
+	    at the top.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hC</option><replaceable>name</replaceable>,...</term>
+	  <indexterm><primary><option>-hC</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by
+	    cost-centre stacks with one of the specified cost centres
+	    anywhere in the stack.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hm</option><replaceable>module</replaceable>,...</term>
+	  <indexterm><primary><option>-hm</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures produced by the
+	    specified modules.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hd</option><replaceable>desc</replaceable>,...</term>
+	  <indexterm><primary><option>-hd</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures with the specified
+	    description strings.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hy</option><replaceable>type</replaceable>,...</term>
+	  <indexterm><primary><option>-hy</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures with the specified
+	    types.</para>
+	  </listitem>
+	</varlistentry>
+	
+	<varlistentry>
+	  <term><option>-hr</option><replaceable>cc</replaceable>,...</term>
+	  <indexterm><primary><option>-hr</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures with retainer sets
+	    containing cost-centre stacks with one of the specified
+	    cost centres at the top.</para>
+	  </listitem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-hb</option><replaceable>bio</replaceable>,...</term>
+	  <indexterm><primary><option>-hb</option></primary><secondary>RTS
+	  option</secondary></indexterm>
+	  <listitem>
+	    <para>Restrict the profile to closures with one of the
+	    specified biographies, where
+	    <replaceable>bio</replaceable> is one of
+	    <literal>lag</literal>, <literal>drag</literal>,
+	    <literal>void</literal>, or <literal>use</literal>.</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <para>For example, the following options will generate a
+      retainer profile restricted to <literal>Branch</literal> and
+      <literal>Leaf</literal> constructors:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hdBranch,Leaf
+</screen>
+
+      <para>There can only be one "break-down" option
+      (eg. <option>-hr</option> in the example above), but there is no
+      limit on the number of further restrictions that may be applied.
+      All the options may be combined, with one exception: GHC doesn't
+      currently support mixing the <option>-hr</option> and
+      <option>-hb</option> options.</para>
+
+      <para>There are two more options which relate to heap
+      profiling:</para>
+
+      <variablelist>
+	<varlistentry>
+	  <term><Option>-i<replaceable>secs</replaceable></Option>:</Term>
+	  <indexterm><primary><option>-i</option></primary></indexterm>
+	  <listItem>
+	    <para>Set the profiling (sampling) interval to
+            <replaceable>secs</replaceable> seconds (the default is
+            0.1&nbsp;second).  Fractions are allowed: for example
+            <Option>-i0.2</Option> will get 5 samples per second.
+            This only affects heap profiling; time profiles are always
+            sampled on a 1/50 second frequency.</para>
+	  </listItem>
+	</varlistentry>
+
+	<varlistentry>
+	  <term><option>-xt</option></term>
+	  <indexterm><primary><option>-xt</option></primary><secondary>RTS option</secondary>
+	  </indexterm>
+	  <listitem>
+	    <para>Include the memory occupied by threads in a heap
+	    profile.  Each thread takes up a small area for its thread
+	    state in addition to the space allocated for its stack
+	    (stacks normally start small and then grow as
+	    necessary).</para>
+	    
+	    <para>This includes the main thread, so using
+	    <option>-xt</option> is a good way to see how much stack
+	    space the program is using.</para>
+
+	    <para>Memory occupied by threads and their stacks is
+	    labelled as &ldquo;TSO&rdquo; when displaying the profile
+	    by closure description or type description.</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+    </sect2>
+    
+    <sect2 id="retainer-prof">
+      <title>Retainer Profiling</title>
+
+      <para>Retainer profiling is designed to help answer questions
+      like <quote>why is this data being retained?</quote>.  We start
+      by defining what we mean by a retainer:</para>
+
+      <blockquote>
+	<para>A retainer is either the system stack, or an unevaluated
+	closure (thunk).</para>
+      </blockquote>
+
+      <para>In particular, constructors are <emphasis>not</emphasis>
+      retainers.</para>
+
+      <para>An object A is retained by an object B if object A can be
+      reached by recursively following pointers starting from object
+      B but not meeting any other retainers on the way.  Each object
+      has one or more retainers, collectively called its
+      <firstterm>retainer set</firstterm>.</para>
+
+      <para>When retainer profiling is requested by giving the program
+      the <option>-hr</option> option, a graph is generated which is
+      broken down by retainer set.  A retainer set is displayed as a
+      set of cost-centre stacks; because this is usually too large to
+      fit on the profile graph, each retainer set is numbered and
+      shown abbreviated on the graph along with its number, and the
+      full list of retainer sets is dumped into the file
+      <filename><replaceable>prog</replaceable>.prof</filename>.</para>
+
+      <para>Retainer profiling requires multiple passes over the live
+      heap in order to discover the full retainer set for each
+      object, which can be quite slow.  So we set a limit on the
+      maximum size of a retainer set, where all retainer sets larger
+      than the maximum retainer set size are replaced by the special
+      set <literal>MANY</literal>.  The maximum set size defaults to 8
+      and can be altered with the <option>-R</option> RTS
+      option:</para>
+      
+      <variablelist>
+	<varlistentry>
+	  <term><option>-R</option><replaceable>size</replaceable></term>
+	  <listitem>
+	    <para>Restrict the number of elements in a retainer set to
+	    <replaceable>size</replaceable> (default 8).</para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+
+      <sect3>
+	<title>Hints for using retainer profiling</title>
+
+	<para>The definition of retainers is designed to reflect a
+        common cause of space leaks: a large structure is retained by
+        an unevaluated computation, and will be released once the
+        compuation is forced.  A good example is looking up a value in
+        a finite map, where unless the lookup is forced in a timely
+        manner the unevaluated lookup will cause the whole mapping to
+        be retained.  These kind of space leaks can often be
+        eliminated by forcing the relevant computations to be
+        performed eagerly, using <literal>seq</literal> or strictness
+        annotations on data constructor fields.</para>
+
+	<para>Often a particular data structure is being retained by a
+        chain of unevaluated closures, only the nearest of which will
+        be reported by retainer profiling - for example A retains B, B
+        retains C, and C retains a large structure.  There might be a
+        large number of Bs but only a single A, so A is really the one
+        we're interested in eliminating.  However, retainer profiling
+        will in this case report B as the retainer of the large
+        structure.  To move further up the chain of retainers, we can
+        ask for another retainer profile but this time restrict the
+        profile to B objects, so we get a profile of the retainers of
+        B:</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hcB
+</screen>
+	
+	<para>This trick isn't foolproof, because there might be other
+        B closures in the heap which aren't the retainers we are
+        interested in, but we've found this to be a useful technique
+        in most cases.</para>
+      </sect3>
+    </sect2>
+
+    <sect2 id="biography-prof">
+      <title>Biographical Profiling</title>
+
+      <para>A typical heap object may be in one of the following four
+      states at each point in its lifetime:</para>
+
+      <itemizedlist>
 	<listitem>
-	  <para>The <option>-hx</option> option generates heap
-	  profiling information in the XML format understood by our
-	  new profiling tool (NOTE: heap profiling with the new tool
-	  is not yet working!  Use <command>hp2ps</command>-style heap
-	  profiling for the time being).</para>
+	  <para>The <firstterm>lag</firstterm> stage, which is the
+	  time between creation and the first use of the
+	  object,</para>
 	</listitem>
-      </varlistentry>
+	<listitem>
+	  <para>the <firstterm>use</firstterm> stage, which lasts from
+	  the first use until the last use of the object, and</para>
+	</listitem>
+	<listitem>
+	  <para>The <firstterm>drag</firstterm> stage, which lasts
+	  from the final use until the last reference to the object
+	  is dropped.</para>
+	</listitem>
+	<listitem>
+	  <para>An object which is never used is said to be in the
+	  <firstterm>void</firstterm> state for its whole
+	  lifetime.</para>
+	</listitem>
+      </itemizedlist>
+
+      <para>A biographical heap profile displays the portion of the
+      live heap in each of the four states listed above.  Usually the
+      most interesting states are the void and drag states: live heap
+      in these states is more likely to be wasted space than heap in
+      the lag or use states.</para>
+
+      <para>It is also possible to break down the heap in one or more
+      of these states by a different criteria, by restricting a
+      profile by biography.  For example, to show the portion of the
+      heap in the drag or void state by producer: </para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hc -hbdrag,void
+</screen>
+
+      <para>Once you know the producer or the type of the heap in the
+      drag or void states, the next step is usually to find the
+      retainer(s):</para>
+
+<screen>
+<replaceable>prog</replaceable> +RTS -hr -hc<replaceable>cc</replaceable>...
+</screen>
+
+      <para>NOTE: this two stage process is required because GHC
+      cannot currently profile using both biographical and retainer
+      information simultaneously.</para>
+    </sect2>
+
+
+
+
+
+  </sect1>
+
+  <sect1 id="prof-xml-tool">
+    <title>Graphical time/allocation profile</title>
+
+    <para>You can view the time and allocation profiling graph of your
+    program graphically, using <command>ghcprof</command>.  This is a
+    new tool with GHC 4.08, and will eventually be the de-facto
+    standard way of viewing GHC profiles<footnote><para>Actually this
+    isn't true any more, we are working on a new tool for
+    displaying heap profiles using Gtk+HS, so
+    <command>ghcprof</command> may go away at some point in the future.</para>
+      </footnote></para>
+
+    <para>To run <command>ghcprof</command>, you need
+    <productname>daVinci</productname> installed, which can be
+    obtained from <ulink
+    url="http://www.informatik.uni-bremen.de/daVinci/"><citetitle>The Graph
+    Visualisation Tool daVinci</citetitle></ulink>.  Install one of
+    the binary
+    distributions<footnote><para><productname>daVinci</productname> is
+    sadly not open-source :-(.</para></footnote>, and set your
+    <envar>DAVINCIHOME</envar> environment variable to point to the
+    installation directory.</para>
+
+    <para><command>ghcprof</command> uses an XML-based profiling log
+    format, and you therefore need to run your program with a
+    different option: <option>-px</option>.  The file generated is
+    still called <filename>&lt;prog&gt;.prof</filename>.  To see the
+    profile, run <command>ghcprof</command> like this:</para>
+
+    <indexterm><primary><option>-px</option></primary></indexterm>
+
+<screen>
+$ ghcprof &lt;prog&gt;.prof
+</screen>
+
+    <para>which should pop up a window showing the call-graph of your
+    program in glorious detail.  More information on using
+    <command>ghcprof</command> can be found at <ulink
+    url="http://www.dcs.warwick.ac.uk/people/academic/Stephen.Jarvis/profiler/index.html"><citetitle>The
+    Cost-Centre Stack Profiling Tool for
+    GHC</citetitle></ulink>.</para>
 
-    </variableList>
-    
   </sect1>
 
   <sect1 id="hp2ps">
-    <title><command>hp2ps</command>--heap profile to PostScript</title>
+    <title><command>hp2ps</command>&ndash;&ndash;heap profile to PostScript</title>
 
     <indexterm><primary><command>hp2ps</command></primary></indexterm>
     <indexterm><primary>heap profiles</primary></indexterm>
@@ -777,6 +1089,132 @@ hp2ps [flags] [&lt;file&gt;[.hp]]
 	</listItem>
       </varListEntry>
     </variableList>
+
+
+    <sect2 id="manipulating-hp">
+      <title>Manipulating the hp file</title>
+
+<para>(Notes kindly offered by Jan-Willhem Maessen.)</para>
+
+<para>
+The <filename>FOO.hp</filename> file produced when you ask for the
+heap profile of a program <filename>FOO</filename> is a text file with a particularly
+simple structure. Here's a representative example, with much of the
+actual data omitted:
+<screen>
+JOB "FOO -hC"
+DATE "Thu Dec 26 18:17 2002"
+SAMPLE_UNIT "seconds"
+VALUE_UNIT "bytes"
+BEGIN_SAMPLE 0.00
+END_SAMPLE 0.00
+BEGIN_SAMPLE 15.07
+  ... sample data ...
+END_SAMPLE 15.07
+BEGIN_SAMPLE 30.23
+  ... sample data ...
+END_SAMPLE 30.23
+... etc.
+BEGIN_SAMPLE 11695.47
+END_SAMPLE 11695.47
+</screen>
+The first four lines (<literal>JOB</literal>, <literal>DATE</literal>, <literal>SAMPLE_UNIT</literal>, <literal>VALUE_UNIT</literal>) form a
+header.  Each block of lines starting with <literal>BEGIN_SAMPLE</literal> and ending
+with <literal>END_SAMPLE</literal> forms a single sample (you can think of this as a
+vertical slice of your heap profile).  The hp2ps utility should accept
+any input with a properly-formatted header followed by a series of
+*complete* samples.
+</para>
+</sect2>
+
+    <sect2>
+      <title>Zooming in on regions of your profile</title>
+
+<para>
+You can look at particular regions of your profile simply by loading a
+copy of the <filename>.hp</filename> file into a text editor and deleting the unwanted
+samples.  The resulting <filename>.hp</filename> file can be run through <command>hp2ps</command> and viewed
+or printed.
+</para>
+</sect2>
+
+    <sect2>
+      <title>Viewing the heap profile of a running program</title>
+
+<para>
+The <filename>.hp</filename> file is generated incrementally as your
+program runs.  In principle, running <command>hp2ps</command> on the incomplete file
+should produce a snapshot of your program's heap usage.  However, the
+last sample in the file may be incomplete, causing <command>hp2ps</command> to fail.  If
+you are using a machine with UNIX utilities installed, it's not too
+hard to work around this problem (though the resulting command line
+looks rather Byzantine):
+<screen>
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+</screen>
+
+The command <command>fgrep -n END_SAMPLE FOO.hp</command> finds the
+end of every complete sample in <filename>FOO.hp</filename>, and labels each sample with
+its ending line number.  We then select the line number of the last
+complete sample using <command>tail</command> and <command>cut</command>.  This is used as a
+parameter to <command>head</command>; the result is as if we deleted the final
+incomplete sample from <filename>FOO.hp</filename>.  This results in a properly-formatted
+.hp file which we feed directly to <command>hp2ps</command>.
+</para>
+</sect2>
+    <sect2>
+      <title>Viewing a heap profile in real time</title>
+
+<para>
+The <command>gv</command> and <command>ghostview</command> programs
+have a "watch file" option can be used to view an up-to-date heap
+profile of your program as it runs.  Simply generate an incremental
+heap profile as described in the previous section.  Run <command>gv</command> on your
+profile:
+<screen>
+  gv -watch -seascape FOO.ps 
+</screen>
+If you forget the <literal>-watch</literal> flag you can still select
+"Watch file" from the "State" menu.  Now each time you generate a new
+profile <filename>FOO.ps</filename> the view will update automatically.
+</para>
+
+<para>
+This can all be encapsulated in a little script:
+<screen>
+  #!/bin/sh
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+  gv -watch -seascape FOO.ps &
+  while [ 1 ] ; do
+    sleep 10 # We generate a new profile every 10 seconds.
+    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+      | hp2ps > FOO.ps
+  done
+</screen>
+Occasionally <command>gv</command> will choke as it tries to read an incomplete copy of
+<filename>FOO.ps</filename> (because <command>hp2ps</command> is still running as an update
+occurs).  A slightly more complicated script works around this
+problem, by using the fact that sending a SIGHUP to gv will cause it
+to re-read its input file:
+<screen>
+  #!/bin/sh
+  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+    | hp2ps > FOO.ps
+  gv FOO.ps &
+  gvpsnum=$!
+  while [ 1 ] ; do
+    sleep 10
+    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
+      | hp2ps > FOO.ps
+    kill -HUP $gvpsnum
+  done    
+</screen>
+</para>
+</sect2>
+
+
   </sect1>
 
   <sect1 id="ticky-ticky">