Reorganisation of the source tree
[ghc-hetmet.git] / docs / users_guide / sooner.xml
diff --git a/docs/users_guide/sooner.xml b/docs/users_guide/sooner.xml
new file mode 100644 (file)
index 0000000..1aba5d1
--- /dev/null
@@ -0,0 +1,602 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<chapter id="sooner-faster-quicker">
+<title>Advice on: sooner, faster, smaller, thriftier</title>
+
+<para>Please advise us of other &ldquo;helpful hints&rdquo; that
+should go here!</para>
+
+<sect1 id="sooner">
+<title>Sooner: producing a program more quickly
+</title>
+
+<indexterm><primary>compiling faster</primary></indexterm>
+<indexterm><primary>faster compiling</primary></indexterm>
+
+    <variablelist>
+      <varlistentry>
+       <term>Don't use <option>-O</option> or (especially) <option>-O2</option>:</term>
+       <listitem>
+         <para>By using them, you are telling GHC that you are
+          willing to suffer longer compilation times for
+          better-quality code.</para>
+
+         <para>GHC is surprisingly zippy for normal compilations
+         without <option>-O</option>!</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use more memory:</term>
+       <listitem>
+         <para>Within reason, more memory for heap space means less
+          garbage collection for GHC, which means less compilation
+          time.  If you use the <option>-Rghc-timing</option> option,
+          you'll get a garbage-collector report.  (Again, you can use
+          the cheap-and-nasty <option>+RTS -Sstderr -RTS</option>
+          option to send the GC stats straight to standard
+          error.)</para>
+
+         <para>If it says you're using more than 20&percnt; of total
+          time in garbage collecting, then more memory would
+          help.</para>
+
+         <para>If the heap size is approaching the maximum (64M by
+          default), and you have lots of memory, try increasing the
+          maximum with the
+          <option>-M&lt;size&gt;</option><indexterm><primary>-M&lt;size&gt;
+          option</primary></indexterm> option, e.g.: <command>ghc -c
+          -O -M1024m Foo.hs</command>.</para>
+
+         <para>Increasing the default allocation area size used by
+          the compiler's RTS might also help: use the
+          <option>-A&lt;size&gt;</option><indexterm><primary>-A&lt;size&gt;
+          option</primary></indexterm> option.</para>
+
+         <para>If GHC persists in being a bad memory citizen, please
+          report it as a bug.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Don't use too much memory!</term>
+       <listitem>
+         <para>As soon as GHC plus its &ldquo;fellow citizens&rdquo;
+          (other processes on your machine) start using more than the
+          <emphasis>real memory</emphasis> on your machine, and the
+          machine starts &ldquo;thrashing,&rdquo; <emphasis>the party
+          is over</emphasis>.  Compile times will be worse than
+          terrible!  Use something like the csh-builtin
+          <command>time</command> command to get a report on how many
+          page faults you're getting.</para>
+
+         <para>If you don't know what virtual memory, thrashing, and
+          page faults are, or you don't know the memory configuration
+          of your machine, <emphasis>don't</emphasis> try to be clever
+          about memory use: you'll just make your life a misery (and
+          for other people, too, probably).</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Try to use local disks when linking:</term>
+       <listitem>
+         <para>Because Haskell objects and libraries tend to be
+          large, it can take many real seconds to slurp the bits
+          to/from a remote filesystem.</para>
+
+         <para>It would be quite sensible to
+          <emphasis>compile</emphasis> on a fast machine using
+          remotely-mounted disks; then <emphasis>link</emphasis> on a
+          slow machine that had your disks directly mounted.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Don't derive/use <function>Read</function> unnecessarily:</term>
+       <listitem>
+         <para>It's ugly and slow.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>GHC compiles some program constructs slowly:</term>
+       <listitem>
+         <para>Deeply-nested list comprehensions seem to be one such;
+          in the past, very large constant tables were bad,
+          too.</para>
+
+         <para>We'd rather you reported such behaviour as a bug, so
+          that we can try to correct it.</para>
+
+         <para>The part of the compiler that is occasionally prone to
+          wandering off for a long time is the strictness analyser.
+          You can turn this off individually with
+          <option>-fno-strictness</option>.
+          <indexterm><primary>-fno-strictness
+          anti-option</primary></indexterm></para>
+         
+         <para>To figure out which part of the compiler is badly
+          behaved, the
+          <option>-v2</option><indexterm><primary><option>-v</option></primary>
+          </indexterm> option is your friend.</para>
+
+         <para>If your module has big wads of constant data, GHC may
+          produce a huge basic block that will cause the native-code
+          generator's register allocator to founder.  Bring on
+          <option>-fvia-C</option><indexterm><primary>-fvia-C
+          option</primary></indexterm> (not that GCC will be that
+          quick about it, either).</para>
+       </listitem>
+      </varlistentry>
+      
+      <varlistentry>
+       <term>Explicit <literal>import</literal> declarations:</term>
+       <listitem>
+         <para>Instead of saying <literal>import Foo</literal>, say
+          <literal>import Foo (...stuff I want...)</literal> You can
+          get GHC to tell you the minimal set of required imports by
+          using the <option>-ddump-minimal-imports</option> option
+          (see <xref linkend="hi-options"/>).</para>
+
+         <para>Truthfully, the reduction on compilation time will be
+          very small.  However, judicious use of
+          <literal>import</literal> declarations can make a program
+          easier to understand, so it may be a good idea
+          anyway.</para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+  </sect1>
+
+  <sect1 id="faster">
+    <title>Faster: producing a program that runs quicker</title>
+
+    <indexterm><primary>faster programs, how to produce</primary></indexterm>
+
+    <para>The key tool to use in making your Haskell program run
+    faster are GHC's profiling facilities, described separately in
+    <xref linkend="profiling"/>.  There is <emphasis>no
+    substitute</emphasis> for finding where your program's time/space
+    is <emphasis>really</emphasis> going, as opposed to where you
+    imagine it is going.</para>
+
+    <para>Another point to bear in mind: By far the best way to
+    improve a program's performance <emphasis>dramatically</emphasis>
+    is to use better algorithms.  Once profiling has thrown the
+    spotlight on the guilty time-consumer(s), it may be better to
+    re-think your program than to try all the tweaks listed below.</para>
+
+    <para>Another extremely efficient way to make your program snappy
+    is to use library code that has been Seriously Tuned By Someone
+    Else.  You <emphasis>might</emphasis> be able to write a better
+    quicksort than the one in <literal>Data.List</literal>, but it
+    will take you much longer than typing <literal>import
+    Data.List</literal>.</para>
+
+    <para>Please report any overly-slow GHC-compiled programs.  Since
+    GHC doesn't have any credible competition in the performance
+    department these days it's hard to say what overly-slow means, so
+    just use your judgement!  Of course, if a GHC compiled program
+    runs slower than the same program compiled with NHC or Hugs, then
+    it's definitely a bug.</para>
+
+    <variablelist>
+      <varlistentry>
+       <term>Optimise, using <option>-O</option> or <option>-O2</option>:</term>
+       <listitem>
+         <para>This is the most basic way to make your program go
+          faster.  Compilation time will be slower, especially with
+          <option>-O2</option>.</para>
+
+         <para>At present, <option>-O2</option> is nearly
+         indistinguishable from <option>-O</option>.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Compile via C and crank up GCC:</term>
+       <listitem>
+         <para>The native code-generator is designed to be quick, not
+          mind-bogglingly clever.  Better to let GCC have a go, as it
+          tries much harder on register allocation, etc.</para>
+
+         <para>At the moment, if you turn on <option>-O</option> you
+          get GCC instead.  This may change in the future.</para>
+
+         <para>So, when we want very fast code, we use: <option>-O
+         -fvia-C</option>.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Overloaded functions are not your friend:</term>
+       <listitem>
+         <para>Haskell's overloading (using type classes) is elegant,
+          neat, etc., etc., but it is death to performance if left to
+          linger in an inner loop.  How can you squash it?</para>
+
+         <variablelist>
+           <varlistentry>
+             <term>Give explicit type signatures:</term>
+             <listitem>
+               <para>Signatures are the basic trick; putting them on
+                exported, top-level functions is good
+                software-engineering practice, anyway.  (Tip: using
+                <option>-fwarn-missing-signatures</option><indexterm><primary>-fwarn-missing-signatures
+                option</primary></indexterm> can help enforce good
+                signature-practice).</para>
+
+               <para>The automatic specialisation of overloaded
+                functions (with <option>-O</option>) should take care
+                of overloaded local and/or unexported functions.</para>
+             </listitem>
+           </varlistentry>
+
+           <varlistentry>
+             <term>Use <literal>SPECIALIZE</literal> pragmas:</term>
+             <listitem>
+               <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
+               <indexterm><primary>overloading, death to</primary></indexterm>
+
+               <para>Specialize the overloading on key functions in
+                your program.  See <xref linkend="specialize-pragma"/>
+                and <xref linkend="specialize-instance-pragma"/>.</para>
+             </listitem>
+           </varlistentry>
+
+           <varlistentry>
+             <term>&ldquo;But how do I know where overloading is creeping in?&rdquo;:</term>
+             <listitem>
+               <para>A low-tech way: grep (search) your interface
+                files for overloaded type signatures.  You can view
+                interface files using the
+                <option>--show-iface</option> option (see <xref
+                linkend="hi-options"/>).
+
+<programlisting>
+% ghc --show-iface Foo.hi | egrep '^[a-z].*::.*=&#62;'
+</programlisting>
+</para>
+             </listitem>
+           </varlistentry>
+         </variablelist>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Strict functions are your dear friends:</term>
+       <listitem>
+         <para>and, among other things, lazy pattern-matching is your
+         enemy.</para>
+
+         <para>(If you don't know what a &ldquo;strict
+          function&rdquo; is, please consult a functional-programming
+          textbook.  A sentence or two of explanation here probably
+          would not do much good.)</para>
+
+         <para>Consider these two code fragments:
+
+<programlisting>
+f (Wibble x y) =  ... # strict
+
+f arg = let { (Wibble x y) = arg } in ... # lazy
+</programlisting>
+
+           The former will result in far better code.</para>
+
+         <para>A less contrived example shows the use of
+          <literal>cases</literal> instead of <literal>lets</literal>
+          to get stricter code (a good thing):
+
+<programlisting>
+f (Wibble x y)  # beautiful but slow
+  = let
+        (a1, b1, c1) = unpackFoo x
+        (a2, b2, c2) = unpackFoo y
+    in ...
+
+f (Wibble x y)  # ugly, and proud of it
+  = case (unpackFoo x) of { (a1, b1, c1) -&#62;
+    case (unpackFoo y) of { (a2, b2, c2) -&#62;
+    ...
+    }}
+</programlisting>
+
+          </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>GHC loves single-constructor data-types:</term>
+       <listitem>
+         <para>It's all the better if a function is strict in a
+          single-constructor type (a type with only one
+          data-constructor; for example, tuples are single-constructor
+          types).</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Newtypes are better than datatypes:</term>
+       <listitem>
+         <para>If your datatype has a single constructor with a
+          single field, use a <literal>newtype</literal> declaration
+          instead of a <literal>data</literal> declaration.  The
+          <literal>newtype</literal> will be optimised away in most
+          cases.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>&ldquo;How do I find out a function's strictness?&rdquo;</term>
+       <listitem>
+         <para>Don't guess&mdash;look it up.</para>
+
+         <para>Look for your function in the interface file, then for
+          the third field in the pragma; it should say
+          <literal>&lowbar;&lowbar;S &lt;string&gt;</literal>.  The
+          <literal>&lt;string&gt;</literal> gives the strictness of
+          the function's arguments.  <function>L</function> is lazy
+          (bad), <function>S</function> and <function>E</function> are
+          strict (good), <function>P</function> is
+          &ldquo;primitive&rdquo; (good), <function>U(...)</function>
+          is strict and &ldquo;unpackable&rdquo; (very good), and
+          <function>A</function> is absent (very good).</para>
+
+         <para>For an &ldquo;unpackable&rdquo;
+          <function>U(...)</function> argument, the info inside tells
+          the strictness of its components.  So, if the argument is a
+          pair, and it says <function>U(AU(LSS))</function>, that
+          means &ldquo;the first component of the pair isn't used; the
+          second component is itself unpackable, with three components
+          (lazy in the first, strict in the second \&#38;
+          third).&rdquo;</para>
+
+         <para>If the function isn't exported, just compile with the
+          extra flag <option>-ddump-simpl</option>; next to the
+          signature for any binder, it will print the self-same
+          pragmatic information as would be put in an interface file.
+          (Besides, Core syntax is fun to look at!)</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Force key functions to be <literal>INLINE</literal>d (esp. monads):</term>
+       <listitem>
+         <para>Placing <literal>INLINE</literal> pragmas on certain
+          functions that are used a lot can have a dramatic effect.
+          See <xref linkend="inline-pragma"/>.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Explicit <literal>export</literal> list:</term>
+       <listitem>
+         <para>If you do not have an explicit export list in a
+          module, GHC must assume that everything in that module will
+          be exported.  This has various pessimising effects.  For
+          example, if a bit of code is actually
+          <emphasis>unused</emphasis> (perhaps because of unfolding
+          effects), GHC will not be able to throw it away, because it
+          is exported and some other module may be relying on its
+          existence.</para>
+
+         <para>GHC can be quite a bit more aggressive with pieces of
+          code if it knows they are not exported.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Look at the Core syntax!</term>
+       <listitem>
+         <para>(The form in which GHC manipulates your code.)  Just
+          run your compilation with <option>-ddump-simpl</option>
+          (don't forget the <option>-O</option>).</para>
+
+         <para>If profiling has pointed the finger at particular
+          functions, look at their Core code.  <literal>lets</literal>
+          are bad, <literal>cases</literal> are good, dictionaries
+          (<literal>d.&lt;Class&gt;.&lt;Unique&gt;</literal>) &lsqb;or
+          anything overloading-ish&rsqb; are bad, nested lambdas are
+          bad, explicit data constructors are good, primitive
+          operations (e.g., <literal>eqInt&num;</literal>) are
+          good,&hellip;</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use strictness annotations:</term>
+       <listitem>
+         <para>Putting a strictness annotation ('!') on a constructor
+         field helps in two ways: it adds strictness to the program,
+         which gives the strictness analyser more to work with, and
+         it might help to reduce space leaks.</para>
+
+         <para>It can also help in a third way: when used with
+         <option>-funbox-strict-fields</option> (see <xref
+         linkend="options-f"/>), a strict field can be unpacked or
+         unboxed in the constructor, and one or more levels of
+         indirection may be removed.  Unpacking only happens for
+         single-constructor datatypes (<literal>Int</literal> is a
+         good candidate, for example).</para>
+
+         <para>Using <option>-funbox-strict-fields</option> is only
+         really a good idea in conjunction with <option>-O</option>,
+         because otherwise the extra packing and unpacking won't be
+         optimised away.  In fact, it is possible that
+         <option>-funbox-strict-fields</option> may worsen
+         performance even <emphasis>with</emphasis>
+         <option>-O</option>, but this is unlikely (let us know if it
+         happens to you).</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use unboxed types (a GHC extension):</term>
+       <listitem>
+         <para>When you are <emphasis>really</emphasis> desperate for
+          speed, and you want to get right down to the &ldquo;raw
+          bits.&rdquo; Please see <xref linkend="glasgow-unboxed"/> for
+          some information about using unboxed types.</para>
+
+         <para>Before resorting to explicit unboxed types, try using
+         strict constructor fields and
+         <option>-funbox-strict-fields</option> first (see above).
+         That way, your code stays portable.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use <literal>foreign import</literal> (a GHC extension) to plug into fast libraries:</term>
+       <listitem>
+         <para>This may take real work, but&hellip; There exist piles
+          of massively-tuned library code, and the best thing is not
+          to compete with it, but link with it.</para>
+
+         <para><xref linkend="ffi"/> describes the foreign function
+         interface.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Don't use <literal>Float</literal>s:</term>
+       <listitem>
+         <para>If you're using <literal>Complex</literal>, definitely
+          use <literal>Complex Double</literal> rather than
+          <literal>Complex Float</literal> (the former is specialised
+          heavily, but the latter isn't).</para>
+
+         <para><literal>Floats</literal> (probably 32-bits) are
+          almost always a bad idea, anyway, unless you Really Know
+          What You Are Doing.  Use <literal>Double</literal>s.
+          There's rarely a speed disadvantage&mdash;modern machines
+          will use the same floating-point unit for both.  With
+          <literal>Double</literal>s, you are much less likely to hang
+          yourself with numerical errors.</para>
+
+         <para>One time when <literal>Float</literal> might be a good
+          idea is if you have a <emphasis>lot</emphasis> of them, say
+          a giant array of <literal>Float</literal>s.  They take up
+          half the space in the heap compared to
+          <literal>Doubles</literal>.  However, this isn't true on a
+          64-bit machine.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use unboxed arrays (<literal>UArray</literal>)</term>
+       <listitem>
+         <para>GHC supports arrays of unboxed elements, for several
+         basic arithmetic element types including
+         <literal>Int</literal> and <literal>Char</literal>: see the
+         <literal>Data.Array.Unboxed</literal> library for details.
+         These arrays are likely to be much faster than using
+         standard Haskell 98 arrays from the
+         <literal>Data.Array</literal> library.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>Use a bigger heap!</term>
+       <listitem>
+         <para>If your program's GC stats
+          (<option>-S</option><indexterm><primary>-S RTS
+          option</primary></indexterm> RTS option) indicate that it's
+          doing lots of garbage-collection (say, more than 20&percnt;
+          of execution time), more memory might help&mdash;with the
+          <option>-M&lt;size&gt;</option><indexterm><primary>-M&lt;size&gt;
+          RTS option</primary></indexterm> or
+          <option>-A&lt;size&gt;</option><indexterm><primary>-A&lt;size&gt;
+          RTS option</primary></indexterm> RTS options (see <xref
+          linkend="rts-options-gc"/>).</para>
+
+         <para>This is especially important if your program uses a
+         lot of mutable arrays of pointers or mutable variables
+         (i.e. <literal>STArray</literal>,
+         <literal>IOArray</literal>, <literal>STRef</literal> and
+         <literal>IORef</literal>, but not <literal>UArray</literal>,
+         <literal>STUArray</literal> or <literal>IOUArray</literal>).
+         GHC's garbage collector currently scans these objects on
+         every collection, so your program won't benefit from
+         generational GC in the normal way if you use lots of
+         these.  Increasing the heap size to reduce the number of
+         collections will probably help.</para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+
+</sect1>
+
+<sect1 id="smaller">
+<title>Smaller: producing a program that is smaller
+</title>
+
+<para>
+<indexterm><primary>smaller programs, how to produce</primary></indexterm>
+</para>
+
+<para>
+Decrease the &ldquo;go-for-it&rdquo; threshold for unfolding smallish
+expressions.  Give a
+<option>-funfolding-use-threshold0</option><indexterm><primary>-funfolding-use-threshold0
+option</primary></indexterm> option for the extreme case. (&ldquo;Only unfoldings with
+zero cost should proceed.&rdquo;)  Warning: except in certain specialised
+cases (like Happy parsers) this is likely to actually
+<emphasis>increase</emphasis> the size of your program, because unfolding
+generally enables extra simplifying optimisations to be performed.
+</para>
+
+<para>
+Avoid <function>Read</function>.
+</para>
+
+<para>
+Use <literal>strip</literal> on your executables.
+</para>
+
+</sect1>
+
+<sect1 id="thriftier">
+<title>Thriftier: producing a program that gobbles less heap space
+</title>
+
+<para>
+<indexterm><primary>memory, using less heap</primary></indexterm>
+<indexterm><primary>space-leaks, avoiding</primary></indexterm>
+<indexterm><primary>heap space, using less</primary></indexterm>
+</para>
+
+<para>
+&ldquo;I think I have a space leak&hellip;&rdquo; Re-run your program
+with <option>+RTS -Sstderr</option>, and remove all doubt!  (You'll
+see the heap usage get bigger and bigger&hellip;)
+&lsqb;Hmmm&hellip;this might be even easier with the
+<option>-G1</option> RTS option; so&hellip; <command>./a.out +RTS
+-Sstderr -G1</command>...]
+<indexterm><primary>-G RTS option</primary></indexterm>
+<indexterm><primary>-Sstderr RTS option</primary></indexterm>
+</para>
+
+<para>
+Once again, the profiling facilities (<xref linkend="profiling"/>) are
+the basic tool for demystifying the space behaviour of your program.
+</para>
+
+<para>
+Strict functions are good for space usage, as they are for time, as
+discussed in the previous section.  Strict functions get right down to
+business, rather than filling up the heap with closures (the system's
+notes to itself about how to evaluate something, should it eventually
+be required).
+</para>
+
+</sect1>
+
+</chapter>
+
+<!-- Emacs stuff:
+     ;;; Local Variables: ***
+     ;;; mode: xml ***
+     ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***
+     ;;; End: ***
+ -->