1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <chapter id="sooner-faster-quicker">
3 <title>Advice on: sooner, faster, smaller, thriftier</title>
5 <para>Please advise us of other “helpful hints” that
9 <title>Sooner: producing a program more quickly
12 <indexterm><primary>compiling faster</primary></indexterm>
13 <indexterm><primary>faster compiling</primary></indexterm>
17 <term>Don't use <option>-O</option> or (especially) <option>-O2</option>:</term>
19 <para>By using them, you are telling GHC that you are
20 willing to suffer longer compilation times for
21 better-quality code.</para>
23 <para>GHC is surprisingly zippy for normal compilations
24 without <option>-O</option>!</para>
29 <term>Use more memory:</term>
31 <para>Within reason, more memory for heap space means less
32 garbage collection for GHC, which means less compilation
33 time. If you use the <option>-Rghc-timing</option> option,
34 you'll get a garbage-collector report. (Again, you can use
35 the cheap-and-nasty <option>+RTS -Sstderr -RTS</option>
36 option to send the GC stats straight to standard
39 <para>If it says you're using more than 20% of total
40 time in garbage collecting, then more memory would
43 <para>If the heap size is approaching the maximum (64M by
44 default), and you have lots of memory, try increasing the
46 <option>-M<size></option><indexterm><primary>-M<size>
47 option</primary></indexterm> option, e.g.: <command>ghc -c
48 -O -M1024m Foo.hs</command>.</para>
50 <para>Increasing the default allocation area size used by
51 the compiler's RTS might also help: use the
52 <option>-A<size></option><indexterm><primary>-A<size>
53 option</primary></indexterm> option.</para>
55 <para>If GHC persists in being a bad memory citizen, please
56 report it as a bug.</para>
61 <term>Don't use too much memory!</term>
63 <para>As soon as GHC plus its “fellow citizens”
64 (other processes on your machine) start using more than the
65 <emphasis>real memory</emphasis> on your machine, and the
66 machine starts “thrashing,” <emphasis>the party
67 is over</emphasis>. Compile times will be worse than
68 terrible! Use something like the csh-builtin
69 <command>time</command> command to get a report on how many
70 page faults you're getting.</para>
72 <para>If you don't know what virtual memory, thrashing, and
73 page faults are, or you don't know the memory configuration
74 of your machine, <emphasis>don't</emphasis> try to be clever
75 about memory use: you'll just make your life a misery (and
76 for other people, too, probably).</para>
81 <term>Try to use local disks when linking:</term>
83 <para>Because Haskell objects and libraries tend to be
84 large, it can take many real seconds to slurp the bits
85 to/from a remote filesystem.</para>
87 <para>It would be quite sensible to
88 <emphasis>compile</emphasis> on a fast machine using
89 remotely-mounted disks; then <emphasis>link</emphasis> on a
90 slow machine that had your disks directly mounted.</para>
95 <term>Don't derive/use <function>Read</function> unnecessarily:</term>
97 <para>It's ugly and slow.</para>
102 <term>GHC compiles some program constructs slowly:</term>
104 <para>Deeply-nested list comprehensions seem to be one such;
105 in the past, very large constant tables were bad,
108 <para>We'd rather you reported such behaviour as a bug, so
109 that we can try to correct it.</para>
111 <para>The part of the compiler that is occasionally prone to
112 wandering off for a long time is the strictness analyser.
113 You can turn this off individually with
114 <option>-fno-strictness</option>.
115 <indexterm><primary>-fno-strictness
116 anti-option</primary></indexterm></para>
118 <para>To figure out which part of the compiler is badly
120 <option>-v2</option><indexterm><primary><option>-v</option></primary>
121 </indexterm> option is your friend.</para>
123 <para>If your module has big wads of constant data, GHC may
124 produce a huge basic block that will cause the native-code
125 generator's register allocator to founder. Bring on
126 <option>-fvia-C</option><indexterm><primary>-fvia-C
127 option</primary></indexterm> (not that GCC will be that
128 quick about it, either).</para>
133 <term>Explicit <literal>import</literal> declarations:</term>
135 <para>Instead of saying <literal>import Foo</literal>, say
136 <literal>import Foo (...stuff I want...)</literal> You can
137 get GHC to tell you the minimal set of required imports by
138 using the <option>-ddump-minimal-imports</option> option
139 (see <xref linkend="hi-options"/>).</para>
141 <para>Truthfully, the reduction on compilation time will be
142 very small. However, judicious use of
143 <literal>import</literal> declarations can make a program
144 easier to understand, so it may be a good idea
152 <title>Faster: producing a program that runs quicker</title>
154 <indexterm><primary>faster programs, how to produce</primary></indexterm>
156 <para>The key tool to use in making your Haskell program run
157 faster are GHC's profiling facilities, described separately in
158 <xref linkend="profiling"/>. There is <emphasis>no
159 substitute</emphasis> for finding where your program's time/space
160 is <emphasis>really</emphasis> going, as opposed to where you
161 imagine it is going.</para>
163 <para>Another point to bear in mind: By far the best way to
164 improve a program's performance <emphasis>dramatically</emphasis>
165 is to use better algorithms. Once profiling has thrown the
166 spotlight on the guilty time-consumer(s), it may be better to
167 re-think your program than to try all the tweaks listed below.</para>
169 <para>Another extremely efficient way to make your program snappy
170 is to use library code that has been Seriously Tuned By Someone
171 Else. You <emphasis>might</emphasis> be able to write a better
172 quicksort than the one in <literal>Data.List</literal>, but it
173 will take you much longer than typing <literal>import
174 Data.List</literal>.</para>
176 <para>Please report any overly-slow GHC-compiled programs. Since
177 GHC doesn't have any credible competition in the performance
178 department these days it's hard to say what overly-slow means, so
179 just use your judgement! Of course, if a GHC compiled program
180 runs slower than the same program compiled with NHC or Hugs, then
181 it's definitely a bug.</para>
185 <term>Optimise, using <option>-O</option> or <option>-O2</option>:</term>
187 <para>This is the most basic way to make your program go
188 faster. Compilation time will be slower, especially with
189 <option>-O2</option>.</para>
191 <para>At present, <option>-O2</option> is nearly
192 indistinguishable from <option>-O</option>.</para>
197 <term>Compile via C and crank up GCC:</term>
199 <para>The native code-generator is designed to be quick, not
200 mind-bogglingly clever. Better to let GCC have a go, as it
201 tries much harder on register allocation, etc.</para>
203 <para>At the moment, if you turn on <option>-O</option> you
204 get GCC instead. This may change in the future.</para>
206 <para>So, when we want very fast code, we use: <option>-O
207 -fvia-C</option>.</para>
212 <term>Overloaded functions are not your friend:</term>
214 <para>Haskell's overloading (using type classes) is elegant,
215 neat, etc., etc., but it is death to performance if left to
216 linger in an inner loop. How can you squash it?</para>
220 <term>Give explicit type signatures:</term>
222 <para>Signatures are the basic trick; putting them on
223 exported, top-level functions is good
224 software-engineering practice, anyway. (Tip: using
225 <option>-fwarn-missing-signatures</option><indexterm><primary>-fwarn-missing-signatures
226 option</primary></indexterm> can help enforce good
227 signature-practice).</para>
229 <para>The automatic specialisation of overloaded
230 functions (with <option>-O</option>) should take care
231 of overloaded local and/or unexported functions.</para>
236 <term>Use <literal>SPECIALIZE</literal> pragmas:</term>
238 <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
239 <indexterm><primary>overloading, death to</primary></indexterm>
241 <para>Specialize the overloading on key functions in
242 your program. See <xref linkend="specialize-pragma"/>
243 and <xref linkend="specialize-instance-pragma"/>.</para>
248 <term>“But how do I know where overloading is creeping in?”:</term>
250 <para>A low-tech way: grep (search) your interface
251 files for overloaded type signatures. You can view
252 interface files using the
253 <option>--show-iface</option> option (see <xref
254 linkend="hi-options"/>).
257 % ghc --show-iface Foo.hi | egrep '^[a-z].*::.*=>'
267 <term>Strict functions are your dear friends:</term>
269 <para>and, among other things, lazy pattern-matching is your
272 <para>(If you don't know what a “strict
273 function” is, please consult a functional-programming
274 textbook. A sentence or two of explanation here probably
275 would not do much good.)</para>
277 <para>Consider these two code fragments:
280 f (Wibble x y) = ... # strict
282 f arg = let { (Wibble x y) = arg } in ... # lazy
285 The former will result in far better code.</para>
287 <para>A less contrived example shows the use of
288 <literal>cases</literal> instead of <literal>lets</literal>
289 to get stricter code (a good thing):
292 f (Wibble x y) # beautiful but slow
294 (a1, b1, c1) = unpackFoo x
295 (a2, b2, c2) = unpackFoo y
298 f (Wibble x y) # ugly, and proud of it
299 = case (unpackFoo x) of { (a1, b1, c1) ->
300 case (unpackFoo y) of { (a2, b2, c2) ->
310 <term>GHC loves single-constructor data-types:</term>
312 <para>It's all the better if a function is strict in a
313 single-constructor type (a type with only one
314 data-constructor; for example, tuples are single-constructor
320 <term>Newtypes are better than datatypes:</term>
322 <para>If your datatype has a single constructor with a
323 single field, use a <literal>newtype</literal> declaration
324 instead of a <literal>data</literal> declaration. The
325 <literal>newtype</literal> will be optimised away in most
331 <term>“How do I find out a function's strictness?”</term>
333 <para>Don't guess—look it up.</para>
335 <para>Look for your function in the interface file, then for
336 the third field in the pragma; it should say
337 <literal>__S <string></literal>. The
338 <literal><string></literal> gives the strictness of
339 the function's arguments. <function>L</function> is lazy
340 (bad), <function>S</function> and <function>E</function> are
341 strict (good), <function>P</function> is
342 “primitive” (good), <function>U(...)</function>
343 is strict and “unpackable” (very good), and
344 <function>A</function> is absent (very good).</para>
346 <para>For an “unpackable”
347 <function>U(...)</function> argument, the info inside tells
348 the strictness of its components. So, if the argument is a
349 pair, and it says <function>U(AU(LSS))</function>, that
350 means “the first component of the pair isn't used; the
351 second component is itself unpackable, with three components
352 (lazy in the first, strict in the second \&
353 third).”</para>
355 <para>If the function isn't exported, just compile with the
356 extra flag <option>-ddump-simpl</option>; next to the
357 signature for any binder, it will print the self-same
358 pragmatic information as would be put in an interface file.
359 (Besides, Core syntax is fun to look at!)</para>
364 <term>Force key functions to be <literal>INLINE</literal>d (esp. monads):</term>
366 <para>Placing <literal>INLINE</literal> pragmas on certain
367 functions that are used a lot can have a dramatic effect.
368 See <xref linkend="inline-pragma"/>.</para>
373 <term>Explicit <literal>export</literal> list:</term>
375 <para>If you do not have an explicit export list in a
376 module, GHC must assume that everything in that module will
377 be exported. This has various pessimising effects. For
378 example, if a bit of code is actually
379 <emphasis>unused</emphasis> (perhaps because of unfolding
380 effects), GHC will not be able to throw it away, because it
381 is exported and some other module may be relying on its
384 <para>GHC can be quite a bit more aggressive with pieces of
385 code if it knows they are not exported.</para>
390 <term>Look at the Core syntax!</term>
392 <para>(The form in which GHC manipulates your code.) Just
393 run your compilation with <option>-ddump-simpl</option>
394 (don't forget the <option>-O</option>).</para>
396 <para>If profiling has pointed the finger at particular
397 functions, look at their Core code. <literal>lets</literal>
398 are bad, <literal>cases</literal> are good, dictionaries
399 (<literal>d.<Class>.<Unique></literal>) [or
400 anything overloading-ish] are bad, nested lambdas are
401 bad, explicit data constructors are good, primitive
402 operations (e.g., <literal>eqInt#</literal>) are
408 <term>Use strictness annotations:</term>
410 <para>Putting a strictness annotation ('!') on a constructor
411 field helps in two ways: it adds strictness to the program,
412 which gives the strictness analyser more to work with, and
413 it might help to reduce space leaks.</para>
415 <para>It can also help in a third way: when used with
416 <option>-funbox-strict-fields</option> (see <xref
417 linkend="options-f"/>), a strict field can be unpacked or
418 unboxed in the constructor, and one or more levels of
419 indirection may be removed. Unpacking only happens for
420 single-constructor datatypes (<literal>Int</literal> is a
421 good candidate, for example).</para>
423 <para>Using <option>-funbox-strict-fields</option> is only
424 really a good idea in conjunction with <option>-O</option>,
425 because otherwise the extra packing and unpacking won't be
426 optimised away. In fact, it is possible that
427 <option>-funbox-strict-fields</option> may worsen
428 performance even <emphasis>with</emphasis>
429 <option>-O</option>, but this is unlikely (let us know if it
430 happens to you).</para>
435 <term>Use unboxed types (a GHC extension):</term>
437 <para>When you are <emphasis>really</emphasis> desperate for
438 speed, and you want to get right down to the “raw
439 bits.” Please see <xref linkend="glasgow-unboxed"/> for
440 some information about using unboxed types.</para>
442 <para>Before resorting to explicit unboxed types, try using
443 strict constructor fields and
444 <option>-funbox-strict-fields</option> first (see above).
445 That way, your code stays portable.</para>
450 <term>Use <literal>foreign import</literal> (a GHC extension) to plug into fast libraries:</term>
452 <para>This may take real work, but… There exist piles
453 of massively-tuned library code, and the best thing is not
454 to compete with it, but link with it.</para>
456 <para><xref linkend="ffi"/> describes the foreign function
462 <term>Don't use <literal>Float</literal>s:</term>
464 <para>If you're using <literal>Complex</literal>, definitely
465 use <literal>Complex Double</literal> rather than
466 <literal>Complex Float</literal> (the former is specialised
467 heavily, but the latter isn't).</para>
469 <para><literal>Floats</literal> (probably 32-bits) are
470 almost always a bad idea, anyway, unless you Really Know
471 What You Are Doing. Use <literal>Double</literal>s.
472 There's rarely a speed disadvantage—modern machines
473 will use the same floating-point unit for both. With
474 <literal>Double</literal>s, you are much less likely to hang
475 yourself with numerical errors.</para>
477 <para>One time when <literal>Float</literal> might be a good
478 idea is if you have a <emphasis>lot</emphasis> of them, say
479 a giant array of <literal>Float</literal>s. They take up
480 half the space in the heap compared to
481 <literal>Doubles</literal>. However, this isn't true on a
482 64-bit machine.</para>
487 <term>Use unboxed arrays (<literal>UArray</literal>)</term>
489 <para>GHC supports arrays of unboxed elements, for several
490 basic arithmetic element types including
491 <literal>Int</literal> and <literal>Char</literal>: see the
492 <literal>Data.Array.Unboxed</literal> library for details.
493 These arrays are likely to be much faster than using
494 standard Haskell 98 arrays from the
495 <literal>Data.Array</literal> library.</para>
500 <term>Use a bigger heap!</term>
502 <para>If your program's GC stats
503 (<option>-S</option><indexterm><primary>-S RTS
504 option</primary></indexterm> RTS option) indicate that it's
505 doing lots of garbage-collection (say, more than 20%
506 of execution time), more memory might help—with the
507 <option>-M<size></option><indexterm><primary>-M<size>
508 RTS option</primary></indexterm> or
509 <option>-A<size></option><indexterm><primary>-A<size>
510 RTS option</primary></indexterm> RTS options (see <xref
511 linkend="rts-options-gc"/>).</para>
513 <para>This is especially important if your program uses a
514 lot of mutable arrays of pointers or mutable variables
515 (i.e. <literal>STArray</literal>,
516 <literal>IOArray</literal>, <literal>STRef</literal> and
517 <literal>IORef</literal>, but not <literal>UArray</literal>,
518 <literal>STUArray</literal> or <literal>IOUArray</literal>).
519 GHC's garbage collector currently scans these objects on
520 every collection, so your program won't benefit from
521 generational GC in the normal way if you use lots of
522 these. Increasing the heap size to reduce the number of
523 collections will probably help.</para>
531 <title>Smaller: producing a program that is smaller
535 <indexterm><primary>smaller programs, how to produce</primary></indexterm>
539 Decrease the “go-for-it” threshold for unfolding smallish
541 <option>-funfolding-use-threshold0</option><indexterm><primary>-funfolding-use-threshold0
542 option</primary></indexterm> option for the extreme case. (“Only unfoldings with
543 zero cost should proceed.”) Warning: except in certain specialised
544 cases (like Happy parsers) this is likely to actually
545 <emphasis>increase</emphasis> the size of your program, because unfolding
546 generally enables extra simplifying optimisations to be performed.
550 Avoid <function>Read</function>.
554 Use <literal>strip</literal> on your executables.
559 <sect1 id="thriftier">
560 <title>Thriftier: producing a program that gobbles less heap space
564 <indexterm><primary>memory, using less heap</primary></indexterm>
565 <indexterm><primary>space-leaks, avoiding</primary></indexterm>
566 <indexterm><primary>heap space, using less</primary></indexterm>
570 “I think I have a space leak…” Re-run your program
571 with <option>+RTS -Sstderr</option>, and remove all doubt! (You'll
572 see the heap usage get bigger and bigger…)
573 [Hmmm…this might be even easier with the
574 <option>-G1</option> RTS option; so… <command>./a.out +RTS
575 -Sstderr -G1</command>...]
576 <indexterm><primary>-G RTS option</primary></indexterm>
577 <indexterm><primary>-Sstderr RTS option</primary></indexterm>
581 Once again, the profiling facilities (<xref linkend="profiling"/>) are
582 the basic tool for demystifying the space behaviour of your program.
586 Strict functions are good for space usage, as they are for time, as
587 discussed in the previous section. Strict functions get right down to
588 business, rather than filling up the heap with closures (the system's
589 notes to itself about how to evaluate something, should it eventually
598 ;;; Local Variables: ***
600 ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***