--- /dev/null
+<?xml version="1.0" encoding="iso-8859-1"?>
+<chapter id="sooner-faster-quicker">
+<title>Advice on: sooner, faster, smaller, thriftier</title>
+
+<para>Please advise us of other “helpful hints” that
+should go here!</para>
+
+<sect1 id="sooner">
+<title>Sooner: producing a program more quickly
+</title>
+
+<indexterm><primary>compiling faster</primary></indexterm>
+<indexterm><primary>faster compiling</primary></indexterm>
+
+ <variablelist>
+ <varlistentry>
+ <term>Don't use <option>-O</option> or (especially) <option>-O2</option>:</term>
+ <listitem>
+ <para>By using them, you are telling GHC that you are
+ willing to suffer longer compilation times for
+ better-quality code.</para>
+
+ <para>GHC is surprisingly zippy for normal compilations
+ without <option>-O</option>!</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use more memory:</term>
+ <listitem>
+ <para>Within reason, more memory for heap space means less
+ garbage collection for GHC, which means less compilation
+ time. If you use the <option>-Rghc-timing</option> option,
+ you'll get a garbage-collector report. (Again, you can use
+ the cheap-and-nasty <option>+RTS -Sstderr -RTS</option>
+ option to send the GC stats straight to standard
+ error.)</para>
+
+ <para>If it says you're using more than 20% of total
+ time in garbage collecting, then more memory would
+ help.</para>
+
+ <para>If the heap size is approaching the maximum (64M by
+ default), and you have lots of memory, try increasing the
+ maximum with the
+ <option>-M<size></option><indexterm><primary>-M<size>
+ option</primary></indexterm> option, e.g.: <command>ghc -c
+ -O -M1024m Foo.hs</command>.</para>
+
+ <para>Increasing the default allocation area size used by
+ the compiler's RTS might also help: use the
+ <option>-A<size></option><indexterm><primary>-A<size>
+ option</primary></indexterm> option.</para>
+
+ <para>If GHC persists in being a bad memory citizen, please
+ report it as a bug.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Don't use too much memory!</term>
+ <listitem>
+ <para>As soon as GHC plus its “fellow citizens”
+ (other processes on your machine) start using more than the
+ <emphasis>real memory</emphasis> on your machine, and the
+ machine starts “thrashing,” <emphasis>the party
+ is over</emphasis>. Compile times will be worse than
+ terrible! Use something like the csh-builtin
+ <command>time</command> command to get a report on how many
+ page faults you're getting.</para>
+
+ <para>If you don't know what virtual memory, thrashing, and
+ page faults are, or you don't know the memory configuration
+ of your machine, <emphasis>don't</emphasis> try to be clever
+ about memory use: you'll just make your life a misery (and
+ for other people, too, probably).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Try to use local disks when linking:</term>
+ <listitem>
+ <para>Because Haskell objects and libraries tend to be
+ large, it can take many real seconds to slurp the bits
+ to/from a remote filesystem.</para>
+
+ <para>It would be quite sensible to
+ <emphasis>compile</emphasis> on a fast machine using
+ remotely-mounted disks; then <emphasis>link</emphasis> on a
+ slow machine that had your disks directly mounted.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Don't derive/use <function>Read</function> unnecessarily:</term>
+ <listitem>
+ <para>It's ugly and slow.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>GHC compiles some program constructs slowly:</term>
+ <listitem>
+ <para>Deeply-nested list comprehensions seem to be one such;
+ in the past, very large constant tables were bad,
+ too.</para>
+
+ <para>We'd rather you reported such behaviour as a bug, so
+ that we can try to correct it.</para>
+
+ <para>The part of the compiler that is occasionally prone to
+ wandering off for a long time is the strictness analyser.
+ You can turn this off individually with
+ <option>-fno-strictness</option>.
+ <indexterm><primary>-fno-strictness
+ anti-option</primary></indexterm></para>
+
+ <para>To figure out which part of the compiler is badly
+ behaved, the
+ <option>-v2</option><indexterm><primary><option>-v</option></primary>
+ </indexterm> option is your friend.</para>
+
+ <para>If your module has big wads of constant data, GHC may
+ produce a huge basic block that will cause the native-code
+ generator's register allocator to founder. Bring on
+ <option>-fvia-C</option><indexterm><primary>-fvia-C
+ option</primary></indexterm> (not that GCC will be that
+ quick about it, either).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Explicit <literal>import</literal> declarations:</term>
+ <listitem>
+ <para>Instead of saying <literal>import Foo</literal>, say
+ <literal>import Foo (...stuff I want...)</literal> You can
+ get GHC to tell you the minimal set of required imports by
+ using the <option>-ddump-minimal-imports</option> option
+ (see <xref linkend="hi-options"/>).</para>
+
+ <para>Truthfully, the reduction on compilation time will be
+ very small. However, judicious use of
+ <literal>import</literal> declarations can make a program
+ easier to understand, so it may be a good idea
+ anyway.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </sect1>
+
+ <sect1 id="faster">
+ <title>Faster: producing a program that runs quicker</title>
+
+ <indexterm><primary>faster programs, how to produce</primary></indexterm>
+
+ <para>The key tool to use in making your Haskell program run
+ faster are GHC's profiling facilities, described separately in
+ <xref linkend="profiling"/>. There is <emphasis>no
+ substitute</emphasis> for finding where your program's time/space
+ is <emphasis>really</emphasis> going, as opposed to where you
+ imagine it is going.</para>
+
+ <para>Another point to bear in mind: By far the best way to
+ improve a program's performance <emphasis>dramatically</emphasis>
+ is to use better algorithms. Once profiling has thrown the
+ spotlight on the guilty time-consumer(s), it may be better to
+ re-think your program than to try all the tweaks listed below.</para>
+
+ <para>Another extremely efficient way to make your program snappy
+ is to use library code that has been Seriously Tuned By Someone
+ Else. You <emphasis>might</emphasis> be able to write a better
+ quicksort than the one in <literal>Data.List</literal>, but it
+ will take you much longer than typing <literal>import
+ Data.List</literal>.</para>
+
+ <para>Please report any overly-slow GHC-compiled programs. Since
+ GHC doesn't have any credible competition in the performance
+ department these days it's hard to say what overly-slow means, so
+ just use your judgement! Of course, if a GHC compiled program
+ runs slower than the same program compiled with NHC or Hugs, then
+ it's definitely a bug.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>Optimise, using <option>-O</option> or <option>-O2</option>:</term>
+ <listitem>
+ <para>This is the most basic way to make your program go
+ faster. Compilation time will be slower, especially with
+ <option>-O2</option>.</para>
+
+ <para>At present, <option>-O2</option> is nearly
+ indistinguishable from <option>-O</option>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Compile via C and crank up GCC:</term>
+ <listitem>
+ <para>The native code-generator is designed to be quick, not
+ mind-bogglingly clever. Better to let GCC have a go, as it
+ tries much harder on register allocation, etc.</para>
+
+ <para>At the moment, if you turn on <option>-O</option> you
+ get GCC instead. This may change in the future.</para>
+
+ <para>So, when we want very fast code, we use: <option>-O
+ -fvia-C</option>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Overloaded functions are not your friend:</term>
+ <listitem>
+ <para>Haskell's overloading (using type classes) is elegant,
+ neat, etc., etc., but it is death to performance if left to
+ linger in an inner loop. How can you squash it?</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>Give explicit type signatures:</term>
+ <listitem>
+ <para>Signatures are the basic trick; putting them on
+ exported, top-level functions is good
+ software-engineering practice, anyway. (Tip: using
+ <option>-fwarn-missing-signatures</option><indexterm><primary>-fwarn-missing-signatures
+ option</primary></indexterm> can help enforce good
+ signature-practice).</para>
+
+ <para>The automatic specialisation of overloaded
+ functions (with <option>-O</option>) should take care
+ of overloaded local and/or unexported functions.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use <literal>SPECIALIZE</literal> pragmas:</term>
+ <listitem>
+ <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
+ <indexterm><primary>overloading, death to</primary></indexterm>
+
+ <para>Specialize the overloading on key functions in
+ your program. See <xref linkend="specialize-pragma"/>
+ and <xref linkend="specialize-instance-pragma"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>“But how do I know where overloading is creeping in?”:</term>
+ <listitem>
+ <para>A low-tech way: grep (search) your interface
+ files for overloaded type signatures. You can view
+ interface files using the
+ <option>--show-iface</option> option (see <xref
+ linkend="hi-options"/>).
+
+<programlisting>
+% ghc --show-iface Foo.hi | egrep '^[a-z].*::.*=>'
+</programlisting>
+</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Strict functions are your dear friends:</term>
+ <listitem>
+ <para>and, among other things, lazy pattern-matching is your
+ enemy.</para>
+
+ <para>(If you don't know what a “strict
+ function” is, please consult a functional-programming
+ textbook. A sentence or two of explanation here probably
+ would not do much good.)</para>
+
+ <para>Consider these two code fragments:
+
+<programlisting>
+f (Wibble x y) = ... # strict
+
+f arg = let { (Wibble x y) = arg } in ... # lazy
+</programlisting>
+
+ The former will result in far better code.</para>
+
+ <para>A less contrived example shows the use of
+ <literal>cases</literal> instead of <literal>lets</literal>
+ to get stricter code (a good thing):
+
+<programlisting>
+f (Wibble x y) # beautiful but slow
+ = let
+ (a1, b1, c1) = unpackFoo x
+ (a2, b2, c2) = unpackFoo y
+ in ...
+
+f (Wibble x y) # ugly, and proud of it
+ = case (unpackFoo x) of { (a1, b1, c1) ->
+ case (unpackFoo y) of { (a2, b2, c2) ->
+ ...
+ }}
+</programlisting>
+
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>GHC loves single-constructor data-types:</term>
+ <listitem>
+ <para>It's all the better if a function is strict in a
+ single-constructor type (a type with only one
+ data-constructor; for example, tuples are single-constructor
+ types).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Newtypes are better than datatypes:</term>
+ <listitem>
+ <para>If your datatype has a single constructor with a
+ single field, use a <literal>newtype</literal> declaration
+ instead of a <literal>data</literal> declaration. The
+ <literal>newtype</literal> will be optimised away in most
+ cases.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>“How do I find out a function's strictness?”</term>
+ <listitem>
+ <para>Don't guess—look it up.</para>
+
+ <para>Look for your function in the interface file, then for
+ the third field in the pragma; it should say
+ <literal>__S <string></literal>. The
+ <literal><string></literal> gives the strictness of
+ the function's arguments. <function>L</function> is lazy
+ (bad), <function>S</function> and <function>E</function> are
+ strict (good), <function>P</function> is
+ “primitive” (good), <function>U(...)</function>
+ is strict and “unpackable” (very good), and
+ <function>A</function> is absent (very good).</para>
+
+ <para>For an “unpackable”
+ <function>U(...)</function> argument, the info inside tells
+ the strictness of its components. So, if the argument is a
+ pair, and it says <function>U(AU(LSS))</function>, that
+ means “the first component of the pair isn't used; the
+ second component is itself unpackable, with three components
+ (lazy in the first, strict in the second \&
+ third).”</para>
+
+ <para>If the function isn't exported, just compile with the
+ extra flag <option>-ddump-simpl</option>; next to the
+ signature for any binder, it will print the self-same
+ pragmatic information as would be put in an interface file.
+ (Besides, Core syntax is fun to look at!)</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Force key functions to be <literal>INLINE</literal>d (esp. monads):</term>
+ <listitem>
+ <para>Placing <literal>INLINE</literal> pragmas on certain
+ functions that are used a lot can have a dramatic effect.
+ See <xref linkend="inline-pragma"/>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Explicit <literal>export</literal> list:</term>
+ <listitem>
+ <para>If you do not have an explicit export list in a
+ module, GHC must assume that everything in that module will
+ be exported. This has various pessimising effects. For
+ example, if a bit of code is actually
+ <emphasis>unused</emphasis> (perhaps because of unfolding
+ effects), GHC will not be able to throw it away, because it
+ is exported and some other module may be relying on its
+ existence.</para>
+
+ <para>GHC can be quite a bit more aggressive with pieces of
+ code if it knows they are not exported.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Look at the Core syntax!</term>
+ <listitem>
+ <para>(The form in which GHC manipulates your code.) Just
+ run your compilation with <option>-ddump-simpl</option>
+ (don't forget the <option>-O</option>).</para>
+
+ <para>If profiling has pointed the finger at particular
+ functions, look at their Core code. <literal>lets</literal>
+ are bad, <literal>cases</literal> are good, dictionaries
+ (<literal>d.<Class>.<Unique></literal>) [or
+ anything overloading-ish] are bad, nested lambdas are
+ bad, explicit data constructors are good, primitive
+ operations (e.g., <literal>eqInt#</literal>) are
+ good,…</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use strictness annotations:</term>
+ <listitem>
+ <para>Putting a strictness annotation ('!') on a constructor
+ field helps in two ways: it adds strictness to the program,
+ which gives the strictness analyser more to work with, and
+ it might help to reduce space leaks.</para>
+
+ <para>It can also help in a third way: when used with
+ <option>-funbox-strict-fields</option> (see <xref
+ linkend="options-f"/>), a strict field can be unpacked or
+ unboxed in the constructor, and one or more levels of
+ indirection may be removed. Unpacking only happens for
+ single-constructor datatypes (<literal>Int</literal> is a
+ good candidate, for example).</para>
+
+ <para>Using <option>-funbox-strict-fields</option> is only
+ really a good idea in conjunction with <option>-O</option>,
+ because otherwise the extra packing and unpacking won't be
+ optimised away. In fact, it is possible that
+ <option>-funbox-strict-fields</option> may worsen
+ performance even <emphasis>with</emphasis>
+ <option>-O</option>, but this is unlikely (let us know if it
+ happens to you).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use unboxed types (a GHC extension):</term>
+ <listitem>
+ <para>When you are <emphasis>really</emphasis> desperate for
+ speed, and you want to get right down to the “raw
+ bits.” Please see <xref linkend="glasgow-unboxed"/> for
+ some information about using unboxed types.</para>
+
+ <para>Before resorting to explicit unboxed types, try using
+ strict constructor fields and
+ <option>-funbox-strict-fields</option> first (see above).
+ That way, your code stays portable.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use <literal>foreign import</literal> (a GHC extension) to plug into fast libraries:</term>
+ <listitem>
+ <para>This may take real work, but… There exist piles
+ of massively-tuned library code, and the best thing is not
+ to compete with it, but link with it.</para>
+
+ <para><xref linkend="ffi"/> describes the foreign function
+ interface.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Don't use <literal>Float</literal>s:</term>
+ <listitem>
+ <para>If you're using <literal>Complex</literal>, definitely
+ use <literal>Complex Double</literal> rather than
+ <literal>Complex Float</literal> (the former is specialised
+ heavily, but the latter isn't).</para>
+
+ <para><literal>Floats</literal> (probably 32-bits) are
+ almost always a bad idea, anyway, unless you Really Know
+ What You Are Doing. Use <literal>Double</literal>s.
+ There's rarely a speed disadvantage—modern machines
+ will use the same floating-point unit for both. With
+ <literal>Double</literal>s, you are much less likely to hang
+ yourself with numerical errors.</para>
+
+ <para>One time when <literal>Float</literal> might be a good
+ idea is if you have a <emphasis>lot</emphasis> of them, say
+ a giant array of <literal>Float</literal>s. They take up
+ half the space in the heap compared to
+ <literal>Doubles</literal>. However, this isn't true on a
+ 64-bit machine.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use unboxed arrays (<literal>UArray</literal>)</term>
+ <listitem>
+ <para>GHC supports arrays of unboxed elements, for several
+ basic arithmetic element types including
+ <literal>Int</literal> and <literal>Char</literal>: see the
+ <literal>Data.Array.Unboxed</literal> library for details.
+ These arrays are likely to be much faster than using
+ standard Haskell 98 arrays from the
+ <literal>Data.Array</literal> library.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Use a bigger heap!</term>
+ <listitem>
+ <para>If your program's GC stats
+ (<option>-S</option><indexterm><primary>-S RTS
+ option</primary></indexterm> RTS option) indicate that it's
+ doing lots of garbage-collection (say, more than 20%
+ of execution time), more memory might help—with the
+ <option>-M<size></option><indexterm><primary>-M<size>
+ RTS option</primary></indexterm> or
+ <option>-A<size></option><indexterm><primary>-A<size>
+ RTS option</primary></indexterm> RTS options (see <xref
+ linkend="rts-options-gc"/>).</para>
+
+ <para>This is especially important if your program uses a
+ lot of mutable arrays of pointers or mutable variables
+ (i.e. <literal>STArray</literal>,
+ <literal>IOArray</literal>, <literal>STRef</literal> and
+ <literal>IORef</literal>, but not <literal>UArray</literal>,
+ <literal>STUArray</literal> or <literal>IOUArray</literal>).
+ GHC's garbage collector currently scans these objects on
+ every collection, so your program won't benefit from
+ generational GC in the normal way if you use lots of
+ these. Increasing the heap size to reduce the number of
+ collections will probably help.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+</sect1>
+
+<sect1 id="smaller">
+<title>Smaller: producing a program that is smaller
+</title>
+
+<para>
+<indexterm><primary>smaller programs, how to produce</primary></indexterm>
+</para>
+
+<para>
+Decrease the “go-for-it” threshold for unfolding smallish
+expressions. Give a
+<option>-funfolding-use-threshold0</option><indexterm><primary>-funfolding-use-threshold0
+option</primary></indexterm> option for the extreme case. (“Only unfoldings with
+zero cost should proceed.”) Warning: except in certain specialised
+cases (like Happy parsers) this is likely to actually
+<emphasis>increase</emphasis> the size of your program, because unfolding
+generally enables extra simplifying optimisations to be performed.
+</para>
+
+<para>
+Avoid <function>Read</function>.
+</para>
+
+<para>
+Use <literal>strip</literal> on your executables.
+</para>
+
+</sect1>
+
+<sect1 id="thriftier">
+<title>Thriftier: producing a program that gobbles less heap space
+</title>
+
+<para>
+<indexterm><primary>memory, using less heap</primary></indexterm>
+<indexterm><primary>space-leaks, avoiding</primary></indexterm>
+<indexterm><primary>heap space, using less</primary></indexterm>
+</para>
+
+<para>
+“I think I have a space leak…” Re-run your program
+with <option>+RTS -Sstderr</option>, and remove all doubt! (You'll
+see the heap usage get bigger and bigger…)
+[Hmmm…this might be even easier with the
+<option>-G1</option> RTS option; so… <command>./a.out +RTS
+-Sstderr -G1</command>...]
+<indexterm><primary>-G RTS option</primary></indexterm>
+<indexterm><primary>-Sstderr RTS option</primary></indexterm>
+</para>
+
+<para>
+Once again, the profiling facilities (<xref linkend="profiling"/>) are
+the basic tool for demystifying the space behaviour of your program.
+</para>
+
+<para>
+Strict functions are good for space usage, as they are for time, as
+discussed in the previous section. Strict functions get right down to
+business, rather than filling up the heap with closures (the system's
+notes to itself about how to evaluate something, should it eventually
+be required).
+</para>
+
+</sect1>
+
+</chapter>
+
+<!-- Emacs stuff:
+ ;;; Local Variables: ***
+ ;;; mode: xml ***
+ ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***
+ ;;; End: ***
+ -->