ghc/docs/users_guide/sooner.sgml

   1 <Chapter id="sooner-faster-quicker">
   2 <Title>Advice on: sooner, faster, smaller, stingier
   3 </Title>
   4
   5 <Para>
   6 Please advise us of other &ldquo;helpful hints&rdquo; that should go here!
   7 </Para>
   8
   9 <Sect1 id="sooner">
  10 <Title>Sooner: producing a program more quickly
  11 </Title>
  12
  13 <Para>
  14 <IndexTerm><Primary>compiling faster</Primary></IndexTerm>
  15 <IndexTerm><Primary>faster compiling</Primary></IndexTerm>
  16 <VariableList>
  17
  18 <VarListEntry>
  19 <Term>Don't use <Option>-O</Option> or (especially) <Option>-O2</Option>:</Term>
  20 <ListItem>
  21 <Para>
  22 By using them, you are telling GHC that you are willing to suffer
  23 longer compilation times for better-quality code.
  24 </Para>
  25
  26 <Para>
  27 GHC is surprisingly zippy for normal compilations without <Option>-O</Option>!
  28 </Para>
  29 </ListItem>
  30 </VarListEntry>
  31 <VarListEntry>
  32 <Term>Use more memory:</Term>
  33 <ListItem>
  34 <Para>
  35 Within reason, more memory for heap space means less garbage
  36 collection for GHC, which means less compilation time.  If you use
  37 the <Option>-Rgc-stats</Option> option, you'll get a garbage-collector report.
  38 (Again, you can use the cheap-and-nasty <Option>-optCrts-Sstderr</Option> option to
  39 send the GC stats straight to standard error.)
  40 </Para>
  41
  42 <Para>
  43 If it says you're using more than 20&percnt; of total time in garbage
  44 collecting, then more memory would help.
  45 </Para>
  46
  47 <Para>
  48 If the heap size is approaching the maximum (64M by default), and you
  49 have lots of memory, try increasing the maximum with the
  50 <Option>-M&lt;size&gt;</Option><IndexTerm><Primary>-M&lt;size&gt; option</Primary></IndexTerm> option, e.g.: <Command>ghc -c -O
  51 -M1024m Foo.hs</Command>.
  52 </Para>
  53
  54 <Para>
  55 Increasing the default allocation area size used by the compiler's RTS
  56 might also help: use the <Option>-A&lt;size&gt;</Option><IndexTerm><Primary>-A&lt;size&gt; option</Primary></IndexTerm>
  57 option.
  58 </Para>
  59
  60 <Para>
  61 If GHC persists in being a bad memory citizen, please report it as a
  62 bug.
  63 </Para>
  64 </ListItem>
  65 </VarListEntry>
  66 <VarListEntry>
  67 <Term>Don't use too much memory!</Term>
  68 <ListItem>
  69 <Para>
  70 As soon as GHC plus its &ldquo;fellow citizens&rdquo; (other processes on your
  71 machine) start using more than the <Emphasis>real memory</Emphasis> on your
  72 machine, and the machine starts &ldquo;thrashing,&rdquo; <Emphasis>the party is
  73 over</Emphasis>.  Compile times will be worse than terrible!  Use something
  74 like the csh-builtin <Command>time</Command> command to get a report on how many page
  75 faults you're getting.
  76 </Para>
  77
  78 <Para>
  79 If you don't know what virtual memory, thrashing, and page faults are,
  80 or you don't know the memory configuration of your machine,
  81 <Emphasis>don't</Emphasis> try to be clever about memory use: you'll just make
  82 your life a misery (and for other people, too, probably).
  83 </Para>
  84 </ListItem>
  85 </VarListEntry>
  86 <VarListEntry>
  87 <Term>Try to use local disks when linking:</Term>
  88 <ListItem>
  89 <Para>
  90 Because Haskell objects and libraries tend to be large, it can take
  91 many real seconds to slurp the bits to/from a remote filesystem.
  92 </Para>
  93
  94 <Para>
  95 It would be quite sensible to <Emphasis>compile</Emphasis> on a fast machine using
  96 remotely-mounted disks; then <Emphasis>link</Emphasis> on a slow machine that had
  97 your disks directly mounted.
  98 </Para>
  99 </ListItem>
 100 </VarListEntry>
 101 <VarListEntry>
 102 <Term>Don't derive/use <Function>Read</Function> unnecessarily:</Term>
 103 <ListItem>
 104 <Para>
 105 It's ugly and slow.
 106 </Para>
 107 </ListItem>
 108 </VarListEntry>
 109 <VarListEntry>
 110 <Term>GHC compiles some program constructs slowly:</Term>
 111 <ListItem>
 112 <Para>
 113 Deeply-nested list comprehensions seem to be one such; in the past,
 114 very large constant tables were bad, too.
 115 </Para>
 116
 117 <Para>
 118 We'd rather you reported such behaviour as a bug, so that we can try
 119 to correct it.
 120 </Para>
 121
 122 <Para>
 123 The parts of the compiler that seem most prone to wandering off for a
 124 long time are the abstract interpreters (strictness and update
 125 analysers).  You can turn these off individually with
 126 <Option>-fno-strictness</Option><IndexTerm><Primary>-fno-strictness anti-option</Primary></IndexTerm> and
 127 <Option>-fno-update-analysis</Option>.<IndexTerm><Primary>-fno-update-analysis anti-option</Primary></IndexTerm>
 128 </Para>
 129
 130 <Para>
 131 To figure out which part of the compiler is badly behaved, the
 132 <Option>-dshow-passes</Option><IndexTerm><Primary>-dshow-passes option</Primary></IndexTerm> option is your
 133 friend.
 134 </Para>
 135
 136 <Para>
 137 If your module has big wads of constant data, GHC may produce a huge
 138 basic block that will cause the native-code generator's register
 139 allocator to founder.  Bring on <Option>-fvia-C</Option><IndexTerm><Primary>-fvia-C option</Primary></IndexTerm>
 140 (not that GCC will be that quick about it, either).
 141 </Para>
 142 </ListItem>
 143 </VarListEntry>
 144 <VarListEntry>
 145 <Term>Avoid the consistency-check on linking:</Term>
 146 <ListItem>
 147 <Para>
 148 Use <Option>-no-link-chk</Option><IndexTerm><Primary>-no-link-chk</Primary></IndexTerm>; saves effort.  This is
 149 probably safe in a I-only-compile-things-one-way setup.
 150 </Para>
 151 </ListItem>
 152 </VarListEntry>
 153 <VarListEntry>
 154 <Term>Explicit <Literal>import</Literal> declarations:</Term>
 155 <ListItem>
 156 <Para>
 157 Instead of saying <Literal>import Foo</Literal>, say <Literal>import Foo (...stuff I want...)</Literal>.
 158 </Para>
 159
 160 <Para>
 161 Truthfully, the reduction on compilation time will be very small.
 162 However, judicious use of <Literal>import</Literal> declarations can make a
 163 program easier to understand, so it may be a good idea anyway.
 164 </Para>
 165 </ListItem>
 166 </VarListEntry>
 167 </VariableList>
 168 </Para>
 169
 170 </Sect1>
 171
 172 <Sect1 id="faster">
 173 <Title>Faster: producing a program that runs quicker
 174 </Title>
 175
 176 <Para>
 177 <IndexTerm><Primary>faster programs, how to produce</Primary></IndexTerm>
 178 </Para>
 179
 180 <Para>
 181 The key tool to use in making your Haskell program run faster are
 182 GHC's profiling facilities, described separately in <XRef LinkEnd="profiling">.  There is <Emphasis>no substitute</Emphasis> for
 183 finding where your program's time/space is <Emphasis>really</Emphasis> going, as
 184 opposed to where you imagine it is going.
 185 </Para>
 186
 187 <Para>
 188 Another point to bear in mind: By far the best way to improve a
 189 program's performance <Emphasis>dramatically</Emphasis> is to use better
 190 algorithms.  Once profiling has thrown the spotlight on the guilty
 191 time-consumer(s), it may be better to re-think your program than to
 192 try all the tweaks listed below.
 193 </Para>
 194
 195 <Para>
 196 Another extremely efficient way to make your program snappy is to use
 197 library code that has been Seriously Tuned By Someone Else.  You
 198 <Emphasis>might</Emphasis> be able to write a better quicksort than the one in the
 199 HBC library, but it will take you much longer than typing <Literal>import
 200 QSort</Literal>.  (Incidentally, it doesn't hurt if the Someone Else is Lennart
 201 Augustsson.)
 202 </Para>
 203
 204 <Para>
 205 Please report any overly-slow GHC-compiled programs.  The current
 206 definition of &ldquo;overly-slow&rdquo; is &ldquo;the HBC-compiled version ran
 207 faster&rdquo;&hellip;
 208 </Para>
 209
 210 <Para>
 211 <VariableList>
 212
 213 <VarListEntry>
 214 <Term>Optimise, using <Option>-O</Option> or <Option>-O2</Option>:</Term>
 215 <ListItem>
 216 <Para>
 217 This is the most basic way
 218 to make your program go faster.  Compilation time will be slower,
 219 especially with <Option>-O2</Option>.
 220 </Para>
 221
 222 <Para>
 223 At present, <Option>-O2</Option> is nearly indistinguishable from <Option>-O</Option>.
 224 </Para>
 225 </ListItem>
 226 </VarListEntry>
 227 <VarListEntry>
 228 <Term>Compile via C and crank up GCC:</Term>
 229 <ListItem>
 230 <Para>
 231 Even with <Option>-O</Option>, GHC tries to
 232 use a native-code generator, if available.  But the native
 233 code-generator is designed to be quick, not mind-bogglingly clever.
 234 Better to let GCC have a go, as it tries much harder on register
 235 allocation, etc.
 236 </Para>
 237
 238 <Para>
 239 So, when we want very fast code, we use: <Option>-O -fvia-C -O2-for-C</Option>.
 240 </Para>
 241 </ListItem>
 242 </VarListEntry>
 243 <VarListEntry>
 244 <Term>Overloaded functions are not your friend:</Term>
 245 <ListItem>
 246 <Para>
 247 Haskell's overloading (using type classes) is elegant, neat, etc.,
 248 etc., but it is death to performance if left to linger in an inner
 249 loop.  How can you squash it?
 250 </Para>
 251
 252 <Para>
 253 <VariableList>
 254
 255 <VarListEntry>
 256 <Term>Give explicit type signatures:</Term>
 257 <ListItem>
 258 <Para>
 259 Signatures are the basic trick; putting them on exported, top-level
 260 functions is good software-engineering practice, anyway.  (Tip: using
 261 <Option>-fwarn-missing-signatures</Option><IndexTerm><Primary>-fwarn-missing-signatures
 262 option</Primary></IndexTerm> can help enforce good signature-practice).
 263 </Para>
 264
 265 <Para>
 266 The automatic specialisation of overloaded functions (with <Option>-O</Option>)
 267 should take care of overloaded local and/or unexported functions.
 268 </Para>
 269 </ListItem>
 270 </VarListEntry>
 271 <VarListEntry>
 272 <Term>Use <Literal>SPECIALIZE</Literal> pragmas:</Term>
 273 <ListItem>
 274 <Para>
 275 <IndexTerm><Primary>SPECIALIZE pragma</Primary></IndexTerm>
 276 <IndexTerm><Primary>overloading, death to</Primary></IndexTerm>
 277 </Para>
 278
 279 <Para>
 280 Specialize the overloading on key functions in your program.  See
 281 <XRef LinkEnd="specialize-pragma"> and
 282 <XRef LinkEnd="specialize-instance-pragma">.
 283 </Para>
 284 </ListItem>
 285 </VarListEntry>
 286 <VarListEntry>
 287 <Term>&ldquo;But how do I know where overloading is creeping in?&rdquo;:</Term>
 288 <ListItem>
 289 <Para>
 290 A low-tech way: grep (search) your interface files for overloaded
 291 type signatures; e.g.,:
 292
 293 <ProgramListing>
 294 % egrep '^[a-z].*::.*=&#62;' *.hi
 295 </ProgramListing>
 296
 297 </Para>
 298 </ListItem>
 299 </VarListEntry>
 300 </VariableList>
 301 </Para>
 302 </ListItem>
 303 </VarListEntry>
 304 <VarListEntry>
 305 <Term>Strict functions are your dear friends:</Term>
 306 <ListItem>
 307 <Para>
 308 and, among other things, lazy pattern-matching is your enemy.
 309 </Para>
 310
 311 <Para>
 312 (If you don't know what a &ldquo;strict function&rdquo; is, please consult a
 313 functional-programming textbook.  A sentence or two of
 314 explanation here probably would not do much good.)
 315 </Para>
 316
 317 <Para>
 318 Consider these two code fragments:
 319
 320 <ProgramListing>
 321 f (Wibble x y) =  ... # strict
 322
 323 f arg = let { (Wibble x y) = arg } in ... # lazy
 324 </ProgramListing>
 325
 326 The former will result in far better code.
 327 </Para>
 328
 329 <Para>
 330 A less contrived example shows the use of <Literal>cases</Literal> instead
 331 of <Literal>lets</Literal> to get stricter code (a good thing):
 332
 333 <ProgramListing>
 334 f (Wibble x y)  # beautiful but slow
 335   = let
 336         (a1, b1, c1) = unpackFoo x
 337         (a2, b2, c2) = unpackFoo y
 338     in ...
 339
 340 f (Wibble x y)  # ugly, and proud of it
 341   = case (unpackFoo x) of { (a1, b1, c1) -&#62;
 342     case (unpackFoo y) of { (a2, b2, c2) -&#62;
 343     ...
 344     }}
 345 </ProgramListing>
 346
 347 </Para>
 348 </ListItem>
 349 </VarListEntry>
 350 <VarListEntry>
 351 <Term>GHC loves single-constructor data-types:</Term>
 352 <ListItem>
 353 <Para>
 354 It's all the better if a function is strict in a single-constructor
 355 type (a type with only one data-constructor; for example, tuples are
 356 single-constructor types).
 357 </Para>
 358 </ListItem>
 359 </VarListEntry>
 360 <VarListEntry>
 361 <Term>Newtypes are better than datatypes:</Term>
 362 <ListItem>
 363 <Para>
 364 If your datatype has a single constructor with a single field, use a
 365 <Literal>newtype</Literal> declaration instead of a <Literal>data</Literal> declaration.  The <Literal>newtype</Literal>
 366 will be optimised away in most cases.
 367 </Para>
 368 </ListItem>
 369 </VarListEntry>
 370 <VarListEntry>
 371 <Term>&ldquo;How do I find out a function's strictness?&rdquo;</Term>
 372 <ListItem>
 373 <Para>
 374 Don't guess&mdash;look it up.
 375 </Para>
 376
 377 <Para>
 378 Look for your function in the interface file, then for the third field
 379 in the pragma; it should say <Literal>&lowbar;S&lowbar; &lt;string&gt;</Literal>.  The <Literal>&lt;string&gt;</Literal>
 380 gives the strictness of the function's arguments.  <Function>L</Function> is lazy
 381 (bad), <Function>S</Function> and <Function>E</Function> are strict (good), <Function>P</Function> is &ldquo;primitive&rdquo; (good),
 382 <Function>U(...)</Function> is strict and
 383 &ldquo;unpackable&rdquo; (very good), and <Function>A</Function> is absent (very good).
 384 </Para>
 385
 386 <Para>
 387 For an &ldquo;unpackable&rdquo; <Function>U(...)</Function> argument, the info inside
 388 tells the strictness of its components.  So, if the argument is a
 389 pair, and it says <Function>U(AU(LSS))</Function>, that means &ldquo;the first component of the
 390 pair isn't used; the second component is itself unpackable, with three
 391 components (lazy in the first, strict in the second \&#38; third).&rdquo;
 392 </Para>
 393
 394 <Para>
 395 If the function isn't exported, just compile with the extra flag <Option>-ddump-simpl</Option>;
 396 next to the signature for any binder, it will print the self-same
 397 pragmatic information as would be put in an interface file.
 398 (Besides, Core syntax is fun to look at!)
 399 </Para>
 400 </ListItem>
 401 </VarListEntry>
 402 <VarListEntry>
 403 <Term>Force key functions to be <Literal>INLINE</Literal>d (esp. monads):</Term>
 404 <ListItem>
 405 <Para>
 406 Placing <Literal>INLINE</Literal> pragmas on certain functions that are used a lot can
 407 have a dramatic effect.  See <XRef LinkEnd="inline-pragma">.
 408 </Para>
 409 </ListItem>
 410 </VarListEntry>
 411 <VarListEntry>
 412 <Term>Explicit <Literal>export</Literal> list:</Term>
 413 <ListItem>
 414 <Para>
 415 If you do not have an explicit export list in a module, GHC must
 416 assume that everything in that module will be exported.  This has
 417 various pessimising effects.  For example, if a bit of code is actually
 418 <Emphasis>unused</Emphasis> (perhaps because of unfolding effects), GHC will not be
 419 able to throw it away, because it is exported and some other module
 420 may be relying on its existence.
 421 </Para>
 422
 423 <Para>
 424 GHC can be quite a bit more aggressive with pieces of code if it knows
 425 they are not exported.
 426 </Para>
 427 </ListItem>
 428 </VarListEntry>
 429 <VarListEntry>
 430 <Term>Look at the Core syntax!</Term>
 431 <ListItem>
 432 <Para>
 433 (The form in which GHC manipulates your code.)  Just run your
 434 compilation with <Option>-ddump-simpl</Option> (don't forget the <Option>-O</Option>).
 435 </Para>
 436
 437 <Para>
 438 If profiling has pointed the finger at particular functions, look at
 439 their Core code.  <Literal>lets</Literal> are bad, <Literal>cases</Literal> are good, dictionaries
 440 (<Literal>d.&lt;Class&gt;.&lt;Unique&gt;</Literal>) &lsqb;or anything overloading-ish&rsqb; are bad,
 441 nested lambdas are bad, explicit data constructors are good, primitive
 442 operations (e.g., <Literal>eqInt&num;</Literal>) are good,&hellip;
 443 </Para>
 444 </ListItem>
 445 </VarListEntry>
 446 <VarListEntry>
 447 <Term>Use unboxed types (a GHC extension):</Term>
 448 <ListItem>
 449 <Para>
 450 When you are <Emphasis>really</Emphasis> desperate for speed, and you want to get
 451 right down to the &ldquo;raw bits.&rdquo;  Please see <XRef LinkEnd="glasgow-unboxed"> for some information about using unboxed
 452 types.
 453 </Para>
 454 </ListItem>
 455 </VarListEntry>
 456 <VarListEntry>
 457 <Term>Use <Function>&lowbar;ccall&lowbar;s</Function> (a GHC extension) to plug into fast libraries:</Term>
 458 <ListItem>
 459 <Para>
 460 This may take real work, but&hellip; There exist piles of
 461 massively-tuned library code, and the best thing is not
 462 to compete with it, but link with it.
 463 </Para>
 464
 465 <Para>
 466 <XRef LinkEnd="glasgow-ccalls"> says a little about how to use C calls.
 467 </Para>
 468 </ListItem>
 469 </VarListEntry>
 470 <VarListEntry>
 471 <Term>Don't use <Literal>Float</Literal>s:</Term>
 472 <ListItem>
 473 <Para>
 474 We don't provide specialisations of Prelude functions for <Literal>Float</Literal>
 475 (but we do for <Literal>Double</Literal>).  If you end up executing overloaded
 476 code, you will lose on performance, perhaps badly.
 477 </Para>
 478
 479 <Para>
 480 <Literal>Floats</Literal> (probably 32-bits) are almost always a bad idea, anyway,
 481 unless you Really Know What You Are Doing.  Use Doubles.  There's
 482 rarely a speed disadvantage&mdash;modern machines will use the same
 483 floating-point unit for both.  With <Literal>Doubles</Literal>, you are much less
 484 likely to hang yourself with numerical errors.
 485 </Para>
 486
 487 <Para>
 488 One time when <Literal>Float</Literal> might be a good idea is if you have a
 489 <Emphasis>lot</Emphasis> of them, say a giant array of <Literal>Float</Literal>s.  They take up
 490 half the space in the heap compared to <Literal>Doubles</Literal>.  However, this isn't
 491 true on a 64-bit machine.
 492 </Para>
 493 </ListItem>
 494 </VarListEntry>
 495 <VarListEntry>
 496 <Term>Use a bigger heap!</Term>
 497 <ListItem>
 498 <Para>
 499 If your program's GC stats (<Option>-S</Option><IndexTerm><Primary>-S RTS option</Primary></IndexTerm> RTS option)
 500 indicate that it's doing lots of garbage-collection (say, more than
 501 20&percnt; of execution time), more memory might help&mdash;with the
 502 <Option>-M&lt;size&gt;</Option><IndexTerm><Primary>-M&lt;size&gt; RTS option</Primary></IndexTerm> or
 503 <Option>-A&lt;size&gt;</Option><IndexTerm><Primary>-A&lt;size&gt; RTS option</Primary></IndexTerm> RTS options (see
 504 <XRef LinkEnd="rts-options-gc">).
 505 </Para>
 506 </ListItem>
 507 </VarListEntry>
 508 </VariableList>
 509 </Para>
 510
 511 </Sect1>
 512
 513 <Sect1 id="smaller">
 514 <Title>Smaller: producing a program that is smaller
 515 </Title>
 516
 517 <Para>
 518 <IndexTerm><Primary>smaller programs, how to produce</Primary></IndexTerm>
 519 </Para>
 520
 521 <Para>
 522 Decrease the &ldquo;go-for-it&rdquo; threshold for unfolding smallish
 523 expressions.  Give a
 524 <Option>-funfolding-use-threshold0</Option><IndexTerm><Primary>-funfolding-use-threshold0
 525 option</Primary></IndexTerm> option for the extreme case. (&ldquo;Only unfoldings with
 526 zero cost should proceed.&rdquo;)  Warning: except in certain specialiised
 527 cases (like Happy parsers) this is likely to actually
 528 <Emphasis>increase</Emphasis> the size of your program, because unfolding
 529 generally enables extra simplifying optimisations to be performed.
 530 </Para>
 531
 532 <Para>
 533 Avoid <Function>Read</Function>.
 534 </Para>
 535
 536 <Para>
 537 Use <Literal>strip</Literal> on your executables.
 538 </Para>
 539
 540 </Sect1>
 541
 542 <Sect1 id="stingier">
 543 <Title>Stingier: producing a program that gobbles less heap space
 544 </Title>
 545
 546 <Para>
 547 <IndexTerm><Primary>memory, using less heap</Primary></IndexTerm>
 548 <IndexTerm><Primary>space-leaks, avoiding</Primary></IndexTerm>
 549 <IndexTerm><Primary>heap space, using less</Primary></IndexTerm>
 550 </Para>
 551
 552 <Para>
 553 &ldquo;I think I have a space leak&hellip;&rdquo;  Re-run your program with
 554 <Option>+RTS -Sstderr</Option>,<IndexTerm><Primary>-Sstderr RTS option</Primary></IndexTerm> and remove all doubt!
 555 (You'll see the heap usage get bigger and bigger&hellip;)  &lsqb;Hmmm&hellip;this
 556 might be even easier with the <Option>-F2s</Option><IndexTerm><Primary>-F2s RTS option</Primary></IndexTerm> RTS
 557 option; so&hellip; <Command>./a.out +RTS -Sstderr -F2s</Command>...]
 558 </Para>
 559
 560 <Para>
 561 Once again, the profiling facilities (<XRef LinkEnd="profiling">) are the basic tool for demystifying the space
 562 behaviour of your program.
 563 </Para>
 564
 565 <Para>
 566 Strict functions are good for space usage, as they are for time, as
 567 discussed in the previous section.  Strict functions get right down to
 568 business, rather than filling up the heap with closures (the system's
 569 notes to itself about how to evaluate something, should it eventually
 570 be required).
 571 </Para>
 572
 573 </Sect1>
 574
 575 </Chapter>