1 <chapter id="sooner-faster-quicker">
2 <title>Advice on: sooner, faster, smaller, thriftier</title>
4 <para>Please advise us of other “helpful hints” that
8 <title>Sooner: producing a program more quickly
11 <indexterm><primary>compiling faster</primary></indexterm>
12 <indexterm><primary>faster compiling</primary></indexterm>
16 <term>Don't use <option>-O</option> or (especially) <option>-O2</option>:</term>
18 <para>By using them, you are telling GHC that you are
19 willing to suffer longer compilation times for
20 better-quality code.</para>
22 <para>GHC is surprisingly zippy for normal compilations
23 without <option>-O</option>!</para>
28 <term>Use more memory:</term>
30 <para>Within reason, more memory for heap space means less
31 garbage collection for GHC, which means less compilation
32 time. If you use the <option>-Rghc-timing</option> option,
33 you'll get a garbage-collector report. (Again, you can use
34 the cheap-and-nasty <option>+RTS -Sstderr -RTS</option>
35 option to send the GC stats straight to standard
38 <para>If it says you're using more than 20% of total
39 time in garbage collecting, then more memory would
42 <para>If the heap size is approaching the maximum (64M by
43 default), and you have lots of memory, try increasing the
45 <option>-M<size></option><indexterm><primary>-M<size>
46 option</primary></indexterm> option, e.g.: <Command>ghc -c
47 -O -M1024m Foo.hs</Command>.</para>
49 <para>Increasing the default allocation area size used by
50 the compiler's RTS might also help: use the
51 <option>-A<size></option><indexterm><primary>-A<size>
52 option</primary></indexterm> option.</para>
54 <para>If GHC persists in being a bad memory citizen, please
55 report it as a bug.</para>
60 <term>Don't use too much memory!</term>
62 <para>As soon as GHC plus its “fellow citizens”
63 (other processes on your machine) start using more than the
64 <Emphasis>real memory</Emphasis> on your machine, and the
65 machine starts “thrashing,” <Emphasis>the party
66 is over</Emphasis>. Compile times will be worse than
67 terrible! Use something like the csh-builtin
68 <Command>time</Command> command to get a report on how many
69 page faults you're getting.</para>
71 <para>If you don't know what virtual memory, thrashing, and
72 page faults are, or you don't know the memory configuration
73 of your machine, <Emphasis>don't</Emphasis> try to be clever
74 about memory use: you'll just make your life a misery (and
75 for other people, too, probably).</para>
80 <term>Try to use local disks when linking:</term>
82 <para>Because Haskell objects and libraries tend to be
83 large, it can take many real seconds to slurp the bits
84 to/from a remote filesystem.</para>
86 <para>It would be quite sensible to
87 <Emphasis>compile</Emphasis> on a fast machine using
88 remotely-mounted disks; then <Emphasis>link</Emphasis> on a
89 slow machine that had your disks directly mounted.</para>
94 <term>Don't derive/use <Function>Read</Function> unnecessarily:</term>
96 <para>It's ugly and slow.</para>
101 <term>GHC compiles some program constructs slowly:</term>
103 <para>Deeply-nested list comprehensions seem to be one such;
104 in the past, very large constant tables were bad,
107 <para>We'd rather you reported such behaviour as a bug, so
108 that we can try to correct it.</para>
110 <para>The part of the compiler that is occasionally prone to
111 wandering off for a long time is the strictness analyser.
112 You can turn this off individually with
113 <option>-fno-strictness</option>.
114 <indexterm><primary>-fno-strictness
115 anti-option</primary></indexterm></para>
117 <para>To figure out which part of the compiler is badly
119 <option>-v2</option><indexterm><primary><option>-v</option></primary>
120 </indexterm> option is your friend.</para>
122 <para>If your module has big wads of constant data, GHC may
123 produce a huge basic block that will cause the native-code
124 generator's register allocator to founder. Bring on
125 <option>-fvia-C</option><indexterm><primary>-fvia-C
126 option</primary></indexterm> (not that GCC will be that
127 quick about it, either).</para>
132 <term>Explicit <literal>import</literal> declarations:</term>
134 <para>Instead of saying <literal>import Foo</literal>, say
135 <literal>import Foo (...stuff I want...)</literal> You can
136 get GHC to tell you the minimal set of required imports by
137 using the <option>-ddump-minimal-imports</option> option
138 (see <xref linkend="hi-options">).</para>
140 <para>Truthfully, the reduction on compilation time will be
141 very small. However, judicious use of
142 <literal>import</literal> declarations can make a program
143 easier to understand, so it may be a good idea
151 <title>Faster: producing a program that runs quicker</title>
153 <indexterm><primary>faster programs, how to produce</primary></indexterm>
155 <para>The key tool to use in making your Haskell program run
156 faster are GHC's profiling facilities, described separately in
157 <XRef LinkEnd="profiling">. There is <Emphasis>no
158 substitute</Emphasis> for finding where your program's time/space
159 is <Emphasis>really</Emphasis> going, as opposed to where you
160 imagine it is going.</para>
162 <para>Another point to bear in mind: By far the best way to
163 improve a program's performance <Emphasis>dramatically</Emphasis>
164 is to use better algorithms. Once profiling has thrown the
165 spotlight on the guilty time-consumer(s), it may be better to
166 re-think your program than to try all the tweaks listed below.</para>
168 <para>Another extremely efficient way to make your program snappy
169 is to use library code that has been Seriously Tuned By Someone
170 Else. You <Emphasis>might</Emphasis> be able to write a better
171 quicksort than the one in <literal>Data.List</literal>, but it
172 will take you much longer than typing <literal>import
173 Data.List</literal>.</para>
175 <para>Please report any overly-slow GHC-compiled programs. Since
176 GHC doesn't have any credible competition in the performance
177 department these days it's hard to say what overly-slow means, so
178 just use your judgement! Of course, if a GHC compiled program
179 runs slower than the same program compiled with NHC or Hugs, then
180 it's definitely a bug.</para>
184 <term>Optimise, using <option>-O</option> or <option>-O2</option>:</term>
186 <para>This is the most basic way to make your program go
187 faster. Compilation time will be slower, especially with
188 <option>-O2</option>.</para>
190 <para>At present, <option>-O2</option> is nearly
191 indistinguishable from <option>-O</option>.</para>
196 <term>Compile via C and crank up GCC:</term>
198 <para>The native code-generator is designed to be quick, not
199 mind-bogglingly clever. Better to let GCC have a go, as it
200 tries much harder on register allocation, etc.</para>
202 <para>At the moment, if you turn on <option>-O</option> you
203 get GCC instead. This may change in the future.</para>
205 <para>So, when we want very fast code, we use: <option>-O
206 -fvia-C</option>.</para>
211 <term>Overloaded functions are not your friend:</term>
213 <para>Haskell's overloading (using type classes) is elegant,
214 neat, etc., etc., but it is death to performance if left to
215 linger in an inner loop. How can you squash it?</para>
219 <term>Give explicit type signatures:</term>
221 <para>Signatures are the basic trick; putting them on
222 exported, top-level functions is good
223 software-engineering practice, anyway. (Tip: using
224 <option>-fwarn-missing-signatures</option><indexterm><primary>-fwarn-missing-signatures
225 option</primary></indexterm> can help enforce good
226 signature-practice).</para>
228 <para>The automatic specialisation of overloaded
229 functions (with <option>-O</option>) should take care
230 of overloaded local and/or unexported functions.</para>
235 <term>Use <literal>SPECIALIZE</literal> pragmas:</term>
237 <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
238 <indexterm><primary>overloading, death to</primary></indexterm>
240 <para>Specialize the overloading on key functions in
241 your program. See <XRef LinkEnd="specialize-pragma">
242 and <XRef LinkEnd="specialize-instance-pragma">.</para>
247 <term>“But how do I know where overloading is creeping in?”:</term>
249 <para>A low-tech way: grep (search) your interface
250 files for overloaded type signatures. You can view
251 interface files using the
252 <option>--show-iface</option> option (see <xref
253 linkend="hi-options">).
256 % ghc --show-iface Foo.hi | egrep '^[a-z].*::.*=>'
266 <term>Strict functions are your dear friends:</term>
268 <para>and, among other things, lazy pattern-matching is your
271 <para>(If you don't know what a “strict
272 function” is, please consult a functional-programming
273 textbook. A sentence or two of explanation here probably
274 would not do much good.)</para>
276 <para>Consider these two code fragments:
279 f (Wibble x y) = ... # strict
281 f arg = let { (Wibble x y) = arg } in ... # lazy
284 The former will result in far better code.</para>
286 <para>A less contrived example shows the use of
287 <literal>cases</literal> instead of <literal>lets</literal>
288 to get stricter code (a good thing):
291 f (Wibble x y) # beautiful but slow
293 (a1, b1, c1) = unpackFoo x
294 (a2, b2, c2) = unpackFoo y
297 f (Wibble x y) # ugly, and proud of it
298 = case (unpackFoo x) of { (a1, b1, c1) ->
299 case (unpackFoo y) of { (a2, b2, c2) ->
309 <term>GHC loves single-constructor data-types:</term>
311 <para>It's all the better if a function is strict in a
312 single-constructor type (a type with only one
313 data-constructor; for example, tuples are single-constructor
319 <term>Newtypes are better than datatypes:</term>
321 <para>If your datatype has a single constructor with a
322 single field, use a <literal>newtype</literal> declaration
323 instead of a <literal>data</literal> declaration. The
324 <literal>newtype</literal> will be optimised away in most
330 <term>“How do I find out a function's strictness?”</term>
332 <para>Don't guess—look it up.</para>
334 <para>Look for your function in the interface file, then for
335 the third field in the pragma; it should say
336 <literal>__S <string></literal>. The
337 <literal><string></literal> gives the strictness of
338 the function's arguments. <Function>L</Function> is lazy
339 (bad), <Function>S</Function> and <Function>E</Function> are
340 strict (good), <Function>P</Function> is
341 “primitive” (good), <Function>U(...)</Function>
342 is strict and “unpackable” (very good), and
343 <Function>A</Function> is absent (very good).</para>
345 <para>For an “unpackable”
346 <Function>U(...)</Function> argument, the info inside tells
347 the strictness of its components. So, if the argument is a
348 pair, and it says <Function>U(AU(LSS))</Function>, that
349 means “the first component of the pair isn't used; the
350 second component is itself unpackable, with three components
351 (lazy in the first, strict in the second \&
352 third).”</para>
354 <para>If the function isn't exported, just compile with the
355 extra flag <option>-ddump-simpl</option>; next to the
356 signature for any binder, it will print the self-same
357 pragmatic information as would be put in an interface file.
358 (Besides, Core syntax is fun to look at!)</para>
363 <term>Force key functions to be <literal>INLINE</literal>d (esp. monads):</term>
365 <para>Placing <literal>INLINE</literal> pragmas on certain
366 functions that are used a lot can have a dramatic effect.
367 See <XRef LinkEnd="inline-pragma">.</para>
372 <term>Explicit <literal>export</literal> list:</term>
374 <para>If you do not have an explicit export list in a
375 module, GHC must assume that everything in that module will
376 be exported. This has various pessimising effects. For
377 example, if a bit of code is actually
378 <Emphasis>unused</Emphasis> (perhaps because of unfolding
379 effects), GHC will not be able to throw it away, because it
380 is exported and some other module may be relying on its
383 <para>GHC can be quite a bit more aggressive with pieces of
384 code if it knows they are not exported.</para>
389 <term>Look at the Core syntax!</term>
391 <para>(The form in which GHC manipulates your code.) Just
392 run your compilation with <option>-ddump-simpl</option>
393 (don't forget the <option>-O</option>).</para>
395 <para>If profiling has pointed the finger at particular
396 functions, look at their Core code. <literal>lets</literal>
397 are bad, <literal>cases</literal> are good, dictionaries
398 (<literal>d.<Class>.<Unique></literal>) [or
399 anything overloading-ish] are bad, nested lambdas are
400 bad, explicit data constructors are good, primitive
401 operations (e.g., <literal>eqInt#</literal>) are
407 <term>Use strictness annotations:</term>
409 <para>Putting a strictness annotation ('!') on a constructor
410 field helps in two ways: it adds strictness to the program,
411 which gives the strictness analyser more to work with, and
412 it might help to reduce space leaks.</para>
414 <para>It can also help in a third way: when used with
415 <option>-funbox-strict-fields</option> (see <xref
416 linkend="options-f">), a strict field can be unpacked or
417 unboxed in the constructor, and one or more levels of
418 indirection may be removed. Unpacking only happens for
419 single-constructor datatypes (<literal>Int</literal> is a
420 good candidate, for example).</para>
422 <para>Using <option>-funbox-strict-fields</option> is only
423 really a good idea in conjunction with <option>-O</option>,
424 because otherwise the extra packing and unpacking won't be
425 optimised away. In fact, it is possible that
426 <option>-funbox-strict-fields</option> may worsen
427 performance even <emphasis>with</emphasis>
428 <option>-O</option>, but this is unlikely (let us know if it
429 happens to you).</para>
434 <term>Use unboxed types (a GHC extension):</term>
436 <para>When you are <Emphasis>really</Emphasis> desperate for
437 speed, and you want to get right down to the “raw
438 bits.” Please see <XRef LinkEnd="glasgow-unboxed"> for
439 some information about using unboxed types.</para>
441 <para>Before resorting to explicit unboxed types, try using
442 strict constructor fields and
443 <option>-funbox-strict-fields</option> first (see above).
444 That way, your code stays portable.</para>
449 <term>Use <literal>foreign import</literal> (a GHC extension) to plug into fast libraries:</term>
451 <para>This may take real work, but… There exist piles
452 of massively-tuned library code, and the best thing is not
453 to compete with it, but link with it.</para>
455 <para><XRef LinkEnd="ffi"> describes the foreign function
461 <term>Don't use <literal>Float</literal>s:</term>
463 <para>If you're using <literal>Complex</literal>, definitely
464 use <literal>Complex Double</literal> rather than
465 <literal>Complex Float</literal> (the former is specialised
466 heavily, but the latter isn't).</para>
468 <para><literal>Floats</literal> (probably 32-bits) are
469 almost always a bad idea, anyway, unless you Really Know
470 What You Are Doing. Use <literal>Double</literal>s.
471 There's rarely a speed disadvantage—modern machines
472 will use the same floating-point unit for both. With
473 <literal>Double</literal>s, you are much less likely to hang
474 yourself with numerical errors.</para>
476 <para>One time when <literal>Float</literal> might be a good
477 idea is if you have a <Emphasis>lot</Emphasis> of them, say
478 a giant array of <literal>Float</literal>s. They take up
479 half the space in the heap compared to
480 <literal>Doubles</literal>. However, this isn't true on a
481 64-bit machine.</para>
486 <term>Use unboxed arrays (<literal>UArray</literal>)</term>
488 <para>GHC supports arrays of unboxed elements, for several
489 basic arithmetic element types including
490 <literal>Int</literal> and <literal>Char</literal>: see the
491 <literal>Data.Array.Unboxed</literal> library for details.
492 These arrays are likely to be much faster than using
493 standard Haskell 98 arrays from the
494 <literal>Data.Array</literal> library.</para>
499 <term>Use a bigger heap!</term>
501 <para>If your program's GC stats
502 (<option>-S</option><indexterm><primary>-S RTS
503 option</primary></indexterm> RTS option) indicate that it's
504 doing lots of garbage-collection (say, more than 20%
505 of execution time), more memory might help—with the
506 <option>-M<size></option><indexterm><primary>-M<size>
507 RTS option</primary></indexterm> or
508 <option>-A<size></option><indexterm><primary>-A<size>
509 RTS option</primary></indexterm> RTS options (see <XRef
510 LinkEnd="rts-options-gc">).</para>
512 <para>This is especially important if your program uses a
513 lot of mutable arrays of pointers or mutable variables
514 (i.e. <literal>STArray</literal>,
515 <literal>IOArray</literal>, <literal>STRef</literal> and
516 <literal>IORef</literal>, but not <literal>UArray</literal>,
517 <literal>STUArray</literal> or <literal>IOUArray</literal>).
518 GHC's garbage collector currently scans these objects on
519 every collection, so your program won't benefit from
520 generational GC in the normal way if you use lots of
521 these. Increasing the heap size to reduce the number of
522 collections will probably help.</para>
530 <title>Smaller: producing a program that is smaller
534 <indexterm><primary>smaller programs, how to produce</primary></indexterm>
538 Decrease the “go-for-it” threshold for unfolding smallish
540 <option>-funfolding-use-threshold0</option><indexterm><primary>-funfolding-use-threshold0
541 option</primary></indexterm> option for the extreme case. (“Only unfoldings with
542 zero cost should proceed.”) Warning: except in certain specialised
543 cases (like Happy parsers) this is likely to actually
544 <Emphasis>increase</Emphasis> the size of your program, because unfolding
545 generally enables extra simplifying optimisations to be performed.
549 Avoid <Function>Read</Function>.
553 Use <literal>strip</literal> on your executables.
558 <sect1 id="thriftier">
559 <title>Thriftier: producing a program that gobbles less heap space
563 <indexterm><primary>memory, using less heap</primary></indexterm>
564 <indexterm><primary>space-leaks, avoiding</primary></indexterm>
565 <indexterm><primary>heap space, using less</primary></indexterm>
569 “I think I have a space leak…” Re-run your program
570 with <option>+RTS -Sstderr</option>, and remove all doubt! (You'll
571 see the heap usage get bigger and bigger…)
572 [Hmmm…this might be even easier with the
573 <option>-G1</option> RTS option; so… <Command>./a.out +RTS
574 -Sstderr -G1</Command>...]
575 <indexterm><primary>-G RTS option</primary></indexterm>
576 <indexterm><primary>-Sstderr RTS option</primary></indexterm>
580 Once again, the profiling facilities (<XRef LinkEnd="profiling">) are
581 the basic tool for demystifying the space behaviour of your program.
585 Strict functions are good for space usage, as they are for time, as
586 discussed in the previous section. Strict functions get right down to
587 business, rather than filling up the heap with closures (the system's
588 notes to itself about how to evaluate something, should it eventually
597 ;;; Local Variables: ***
599 ;;; sgml-parent-document: ("users_guide.sgml" "book" "chapter") ***