-<chapter id="sooner-faster-quicker">
-<title>Advice on: sooner, faster, smaller, thriftier</title>
-
-<para>Please advise us of other “helpful hints” that
-should go here!</para>
-
-<sect1 id="sooner">
-<title>Sooner: producing a program more quickly
-</title>
-
-<indexterm><primary>compiling faster</primary></indexterm>
-<indexterm><primary>faster compiling</primary></indexterm>
-
- <variablelist>
- <varlistentry>
- <term>Don't use <option>-O</option> or (especially) <option>-O2</option>:</term>
- <listitem>
- <para>By using them, you are telling GHC that you are
- willing to suffer longer compilation times for
- better-quality code.</para>
-
- <para>GHC is surprisingly zippy for normal compilations
- without <option>-O</option>!</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use more memory:</term>
- <listitem>
- <para>Within reason, more memory for heap space means less
- garbage collection for GHC, which means less compilation
- time. If you use the <option>-Rghc-timing</option> option,
- you'll get a garbage-collector report. (Again, you can use
- the cheap-and-nasty <option>+RTS -Sstderr -RTS</option>
- option to send the GC stats straight to standard
- error.)</para>
-
- <para>If it says you're using more than 20% of total
- time in garbage collecting, then more memory would
- help.</para>
-
- <para>If the heap size is approaching the maximum (64M by
- default), and you have lots of memory, try increasing the
- maximum with the
- <option>-M<size></option><indexterm><primary>-M<size>
- option</primary></indexterm> option, e.g.: <command>ghc -c
- -O -M1024m Foo.hs</command>.</para>
-
- <para>Increasing the default allocation area size used by
- the compiler's RTS might also help: use the
- <option>-A<size></option><indexterm><primary>-A<size>
- option</primary></indexterm> option.</para>
-
- <para>If GHC persists in being a bad memory citizen, please
- report it as a bug.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Don't use too much memory!</term>
- <listitem>
- <para>As soon as GHC plus its “fellow citizens”
- (other processes on your machine) start using more than the
- <emphasis>real memory</emphasis> on your machine, and the
- machine starts “thrashing,” <emphasis>the party
- is over</emphasis>. Compile times will be worse than
- terrible! Use something like the csh-builtin
- <command>time</command> command to get a report on how many
- page faults you're getting.</para>
-
- <para>If you don't know what virtual memory, thrashing, and
- page faults are, or you don't know the memory configuration
- of your machine, <emphasis>don't</emphasis> try to be clever
- about memory use: you'll just make your life a misery (and
- for other people, too, probably).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Try to use local disks when linking:</term>
- <listitem>
- <para>Because Haskell objects and libraries tend to be
- large, it can take many real seconds to slurp the bits
- to/from a remote filesystem.</para>
-
- <para>It would be quite sensible to
- <emphasis>compile</emphasis> on a fast machine using
- remotely-mounted disks; then <emphasis>link</emphasis> on a
- slow machine that had your disks directly mounted.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Don't derive/use <function>Read</function> unnecessarily:</term>
- <listitem>
- <para>It's ugly and slow.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>GHC compiles some program constructs slowly:</term>
- <listitem>
- <para>Deeply-nested list comprehensions seem to be one such;
- in the past, very large constant tables were bad,
- too.</para>
-
- <para>We'd rather you reported such behaviour as a bug, so
- that we can try to correct it.</para>
-
- <para>The part of the compiler that is occasionally prone to
- wandering off for a long time is the strictness analyser.
- You can turn this off individually with
- <option>-fno-strictness</option>.
- <indexterm><primary>-fno-strictness
- anti-option</primary></indexterm></para>
-
- <para>To figure out which part of the compiler is badly
- behaved, the
- <option>-v2</option><indexterm><primary><option>-v</option></primary>
- </indexterm> option is your friend.</para>
-
- <para>If your module has big wads of constant data, GHC may
- produce a huge basic block that will cause the native-code
- generator's register allocator to founder. Bring on
- <option>-fvia-C</option><indexterm><primary>-fvia-C
- option</primary></indexterm> (not that GCC will be that
- quick about it, either).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Explicit <literal>import</literal> declarations:</term>
- <listitem>
- <para>Instead of saying <literal>import Foo</literal>, say
- <literal>import Foo (...stuff I want...)</literal> You can
- get GHC to tell you the minimal set of required imports by
- using the <option>-ddump-minimal-imports</option> option
- (see <xref linkend="hi-options"/>).</para>
-
- <para>Truthfully, the reduction on compilation time will be
- very small. However, judicious use of
- <literal>import</literal> declarations can make a program
- easier to understand, so it may be a good idea
- anyway.</para>
- </listitem>
- </varlistentry>
- </variablelist>
- </sect1>
-
- <sect1 id="faster">
- <title>Faster: producing a program that runs quicker</title>
-
- <indexterm><primary>faster programs, how to produce</primary></indexterm>
-
- <para>The key tool to use in making your Haskell program run
- faster are GHC's profiling facilities, described separately in
- <xref linkend="profiling"/>. There is <emphasis>no
- substitute</emphasis> for finding where your program's time/space
- is <emphasis>really</emphasis> going, as opposed to where you
- imagine it is going.</para>
-
- <para>Another point to bear in mind: By far the best way to
- improve a program's performance <emphasis>dramatically</emphasis>
- is to use better algorithms. Once profiling has thrown the
- spotlight on the guilty time-consumer(s), it may be better to
- re-think your program than to try all the tweaks listed below.</para>
-
- <para>Another extremely efficient way to make your program snappy
- is to use library code that has been Seriously Tuned By Someone
- Else. You <emphasis>might</emphasis> be able to write a better
- quicksort than the one in <literal>Data.List</literal>, but it
- will take you much longer than typing <literal>import
- Data.List</literal>.</para>
-
- <para>Please report any overly-slow GHC-compiled programs. Since
- GHC doesn't have any credible competition in the performance
- department these days it's hard to say what overly-slow means, so
- just use your judgement! Of course, if a GHC compiled program
- runs slower than the same program compiled with NHC or Hugs, then
- it's definitely a bug.</para>
-
- <variablelist>
- <varlistentry>
- <term>Optimise, using <option>-O</option> or <option>-O2</option>:</term>
- <listitem>
- <para>This is the most basic way to make your program go
- faster. Compilation time will be slower, especially with
- <option>-O2</option>.</para>
-
- <para>At present, <option>-O2</option> is nearly
- indistinguishable from <option>-O</option>.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Compile via C and crank up GCC:</term>
- <listitem>
- <para>The native code-generator is designed to be quick, not
- mind-bogglingly clever. Better to let GCC have a go, as it
- tries much harder on register allocation, etc.</para>
-
- <para>At the moment, if you turn on <option>-O</option> you
- get GCC instead. This may change in the future.</para>
-
- <para>So, when we want very fast code, we use: <option>-O
- -fvia-C</option>.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Overloaded functions are not your friend:</term>
- <listitem>
- <para>Haskell's overloading (using type classes) is elegant,
- neat, etc., etc., but it is death to performance if left to
- linger in an inner loop. How can you squash it?</para>
-
- <variablelist>
- <varlistentry>
- <term>Give explicit type signatures:</term>
- <listitem>
- <para>Signatures are the basic trick; putting them on
- exported, top-level functions is good
- software-engineering practice, anyway. (Tip: using
- <option>-fwarn-missing-signatures</option><indexterm><primary>-fwarn-missing-signatures
- option</primary></indexterm> can help enforce good
- signature-practice).</para>
-
- <para>The automatic specialisation of overloaded
- functions (with <option>-O</option>) should take care
- of overloaded local and/or unexported functions.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use <literal>SPECIALIZE</literal> pragmas:</term>
- <listitem>
- <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
- <indexterm><primary>overloading, death to</primary></indexterm>
-
- <para>Specialize the overloading on key functions in
- your program. See <xref linkend="specialize-pragma"/>
- and <xref linkend="specialize-instance-pragma"/>.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>“But how do I know where overloading is creeping in?”:</term>
- <listitem>
- <para>A low-tech way: grep (search) your interface
- files for overloaded type signatures. You can view
- interface files using the
- <option>--show-iface</option> option (see <xref
- linkend="hi-options"/>).
-
-<programlisting>
-% ghc --show-iface Foo.hi | egrep '^[a-z].*::.*=>'
-</programlisting>
-</para>
- </listitem>
- </varlistentry>
- </variablelist>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Strict functions are your dear friends:</term>
- <listitem>
- <para>and, among other things, lazy pattern-matching is your
- enemy.</para>
-
- <para>(If you don't know what a “strict
- function” is, please consult a functional-programming
- textbook. A sentence or two of explanation here probably
- would not do much good.)</para>
-
- <para>Consider these two code fragments:
-
-<programlisting>
-f (Wibble x y) = ... # strict
-
-f arg = let { (Wibble x y) = arg } in ... # lazy
-</programlisting>
-
- The former will result in far better code.</para>
-
- <para>A less contrived example shows the use of
- <literal>cases</literal> instead of <literal>lets</literal>
- to get stricter code (a good thing):
-
-<programlisting>
-f (Wibble x y) # beautiful but slow
- = let
- (a1, b1, c1) = unpackFoo x
- (a2, b2, c2) = unpackFoo y
- in ...
-
-f (Wibble x y) # ugly, and proud of it
- = case (unpackFoo x) of { (a1, b1, c1) ->
- case (unpackFoo y) of { (a2, b2, c2) ->
- ...
- }}
-</programlisting>
-
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>GHC loves single-constructor data-types:</term>
- <listitem>
- <para>It's all the better if a function is strict in a
- single-constructor type (a type with only one
- data-constructor; for example, tuples are single-constructor
- types).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Newtypes are better than datatypes:</term>
- <listitem>
- <para>If your datatype has a single constructor with a
- single field, use a <literal>newtype</literal> declaration
- instead of a <literal>data</literal> declaration. The
- <literal>newtype</literal> will be optimised away in most
- cases.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>“How do I find out a function's strictness?”</term>
- <listitem>
- <para>Don't guess—look it up.</para>
-
- <para>Look for your function in the interface file, then for
- the third field in the pragma; it should say
- <literal>__S <string></literal>. The
- <literal><string></literal> gives the strictness of
- the function's arguments. <function>L</function> is lazy
- (bad), <function>S</function> and <function>E</function> are
- strict (good), <function>P</function> is
- “primitive” (good), <function>U(...)</function>
- is strict and “unpackable” (very good), and
- <function>A</function> is absent (very good).</para>
-
- <para>For an “unpackable”
- <function>U(...)</function> argument, the info inside tells
- the strictness of its components. So, if the argument is a
- pair, and it says <function>U(AU(LSS))</function>, that
- means “the first component of the pair isn't used; the
- second component is itself unpackable, with three components
- (lazy in the first, strict in the second \&
- third).”</para>
-
- <para>If the function isn't exported, just compile with the
- extra flag <option>-ddump-simpl</option>; next to the
- signature for any binder, it will print the self-same
- pragmatic information as would be put in an interface file.
- (Besides, Core syntax is fun to look at!)</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Force key functions to be <literal>INLINE</literal>d (esp. monads):</term>
- <listitem>
- <para>Placing <literal>INLINE</literal> pragmas on certain
- functions that are used a lot can have a dramatic effect.
- See <xref linkend="inline-pragma"/>.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Explicit <literal>export</literal> list:</term>
- <listitem>
- <para>If you do not have an explicit export list in a
- module, GHC must assume that everything in that module will
- be exported. This has various pessimising effects. For
- example, if a bit of code is actually
- <emphasis>unused</emphasis> (perhaps because of unfolding
- effects), GHC will not be able to throw it away, because it
- is exported and some other module may be relying on its
- existence.</para>
-
- <para>GHC can be quite a bit more aggressive with pieces of
- code if it knows they are not exported.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Look at the Core syntax!</term>
- <listitem>
- <para>(The form in which GHC manipulates your code.) Just
- run your compilation with <option>-ddump-simpl</option>
- (don't forget the <option>-O</option>).</para>
-
- <para>If profiling has pointed the finger at particular
- functions, look at their Core code. <literal>lets</literal>
- are bad, <literal>cases</literal> are good, dictionaries
- (<literal>d.<Class>.<Unique></literal>) [or
- anything overloading-ish] are bad, nested lambdas are
- bad, explicit data constructors are good, primitive
- operations (e.g., <literal>eqInt#</literal>) are
- good,…</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use strictness annotations:</term>
- <listitem>
- <para>Putting a strictness annotation ('!') on a constructor
- field helps in two ways: it adds strictness to the program,
- which gives the strictness analyser more to work with, and
- it might help to reduce space leaks.</para>
-
- <para>It can also help in a third way: when used with
- <option>-funbox-strict-fields</option> (see <xref
- linkend="options-f"/>), a strict field can be unpacked or
- unboxed in the constructor, and one or more levels of
- indirection may be removed. Unpacking only happens for
- single-constructor datatypes (<literal>Int</literal> is a
- good candidate, for example).</para>
-
- <para>Using <option>-funbox-strict-fields</option> is only
- really a good idea in conjunction with <option>-O</option>,
- because otherwise the extra packing and unpacking won't be
- optimised away. In fact, it is possible that
- <option>-funbox-strict-fields</option> may worsen
- performance even <emphasis>with</emphasis>
- <option>-O</option>, but this is unlikely (let us know if it
- happens to you).</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use unboxed types (a GHC extension):</term>
- <listitem>
- <para>When you are <emphasis>really</emphasis> desperate for
- speed, and you want to get right down to the “raw
- bits.” Please see <xref linkend="glasgow-unboxed"/> for
- some information about using unboxed types.</para>
-
- <para>Before resorting to explicit unboxed types, try using
- strict constructor fields and
- <option>-funbox-strict-fields</option> first (see above).
- That way, your code stays portable.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use <literal>foreign import</literal> (a GHC extension) to plug into fast libraries:</term>
- <listitem>
- <para>This may take real work, but… There exist piles
- of massively-tuned library code, and the best thing is not
- to compete with it, but link with it.</para>
-
- <para><xref linkend="ffi"/> describes the foreign function
- interface.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Don't use <literal>Float</literal>s:</term>
- <listitem>
- <para>If you're using <literal>Complex</literal>, definitely
- use <literal>Complex Double</literal> rather than
- <literal>Complex Float</literal> (the former is specialised
- heavily, but the latter isn't).</para>
-
- <para><literal>Floats</literal> (probably 32-bits) are
- almost always a bad idea, anyway, unless you Really Know
- What You Are Doing. Use <literal>Double</literal>s.
- There's rarely a speed disadvantage—modern machines
- will use the same floating-point unit for both. With
- <literal>Double</literal>s, you are much less likely to hang
- yourself with numerical errors.</para>
-
- <para>One time when <literal>Float</literal> might be a good
- idea is if you have a <emphasis>lot</emphasis> of them, say
- a giant array of <literal>Float</literal>s. They take up
- half the space in the heap compared to
- <literal>Doubles</literal>. However, this isn't true on a
- 64-bit machine.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use unboxed arrays (<literal>UArray</literal>)</term>
- <listitem>
- <para>GHC supports arrays of unboxed elements, for several
- basic arithmetic element types including
- <literal>Int</literal> and <literal>Char</literal>: see the
- <literal>Data.Array.Unboxed</literal> library for details.
- These arrays are likely to be much faster than using
- standard Haskell 98 arrays from the
- <literal>Data.Array</literal> library.</para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Use a bigger heap!</term>
- <listitem>
- <para>If your program's GC stats
- (<option>-S</option><indexterm><primary>-S RTS
- option</primary></indexterm> RTS option) indicate that it's
- doing lots of garbage-collection (say, more than 20%
- of execution time), more memory might help—with the
- <option>-M<size></option><indexterm><primary>-M<size>
- RTS option</primary></indexterm> or
- <option>-A<size></option><indexterm><primary>-A<size>
- RTS option</primary></indexterm> RTS options (see <xref
- linkend="rts-options-gc"/>).</para>
-
- <para>This is especially important if your program uses a
- lot of mutable arrays of pointers or mutable variables
- (i.e. <literal>STArray</literal>,
- <literal>IOArray</literal>, <literal>STRef</literal> and
- <literal>IORef</literal>, but not <literal>UArray</literal>,
- <literal>STUArray</literal> or <literal>IOUArray</literal>).
- GHC's garbage collector currently scans these objects on
- every collection, so your program won't benefit from
- generational GC in the normal way if you use lots of
- these. Increasing the heap size to reduce the number of
- collections will probably help.</para>
- </listitem>
- </varlistentry>
- </variablelist>
-
-</sect1>
-
-<sect1 id="smaller">
-<title>Smaller: producing a program that is smaller
-</title>
-
-<para>
-<indexterm><primary>smaller programs, how to produce</primary></indexterm>
-</para>
-
-<para>
-Decrease the “go-for-it” threshold for unfolding smallish
-expressions. Give a
-<option>-funfolding-use-threshold0</option><indexterm><primary>-funfolding-use-threshold0
-option</primary></indexterm> option for the extreme case. (“Only unfoldings with
-zero cost should proceed.”) Warning: except in certain specialised
-cases (like Happy parsers) this is likely to actually
-<emphasis>increase</emphasis> the size of your program, because unfolding
-generally enables extra simplifying optimisations to be performed.
-</para>
-
-<para>
-Avoid <function>Read</function>.
-</para>
-
-<para>
-Use <literal>strip</literal> on your executables.
-</para>
-
-</sect1>
-
-<sect1 id="thriftier">
-<title>Thriftier: producing a program that gobbles less heap space
-</title>
-
-<para>
-<indexterm><primary>memory, using less heap</primary></indexterm>
-<indexterm><primary>space-leaks, avoiding</primary></indexterm>
-<indexterm><primary>heap space, using less</primary></indexterm>
-</para>
-
-<para>
-“I think I have a space leak…” Re-run your program
-with <option>+RTS -Sstderr</option>, and remove all doubt! (You'll
-see the heap usage get bigger and bigger…)
-[Hmmm…this might be even easier with the
-<option>-G1</option> RTS option; so… <command>./a.out +RTS
--Sstderr -G1</command>...]
-<indexterm><primary>-G RTS option</primary></indexterm>
-<indexterm><primary>-Sstderr RTS option</primary></indexterm>
-</para>
-
-<para>
-Once again, the profiling facilities (<xref linkend="profiling"/>) are
-the basic tool for demystifying the space behaviour of your program.
-</para>
-
-<para>
-Strict functions are good for space usage, as they are for time, as
-discussed in the previous section. Strict functions get right down to
-business, rather than filling up the heap with closures (the system's
-notes to itself about how to evaluate something, should it eventually
-be required).
-</para>
-
-</sect1>
-
-</chapter>
-
-<!-- Emacs stuff:
- ;;; Local Variables: ***
- ;;; mode: sgml ***
- ;;; sgml-parent-document: ("users_guide.sgml" "book" "chapter") ***
- ;;; End: ***
- -->