X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;ds=sidebyside;f=ghc%2Fdocs%2Fusers_guide%2Fsooner.sgml;fp=ghc%2Fdocs%2Fusers_guide%2Fsooner.sgml;h=63ede497d09da58e674e61145db33c1f841b0ca7;hb=c7f8f1e62b555462f98c3f813440559116033a99;hp=0000000000000000000000000000000000000000;hpb=bb0b4b694e2f9cc87c195ca176ed522b1fe2ff8e;p=ghc-hetmet.git diff --git a/ghc/docs/users_guide/sooner.sgml b/ghc/docs/users_guide/sooner.sgml new file mode 100644 index 0000000..63ede49 --- /dev/null +++ b/ghc/docs/users_guide/sooner.sgml @@ -0,0 +1,575 @@ + +Advice on: sooner, faster, smaller, stingier + + + +Please advise us of other ``helpful hints'' that should go here! + + + +Sooner: producing a program more quickly + + + +compiling faster +faster compiling + + + +Don't use -O or (especially) -O2: + + +By using them, you are telling GHC that you are willing to suffer +longer compilation times for better-quality code. + + + +GHC is surprisingly zippy for normal compilations without -O! + + + + +Use more memory: + + +Within reason, more memory for heap space means less garbage +collection for GHC, which means less compilation time. If you use +the -Rgc-stats option, you'll get a garbage-collector report. +(Again, you can use the cheap-and-nasty -optCrts-Sstderr option to +send the GC stats straight to standard error.) + + + +If it says you're using more than 20% of total time in garbage +collecting, then more memory would help. + + + +If the heap size is approaching the maximum (64M by default), and you +have lots of memory, try increasing the maximum with the +-M<size>-M<size> option option, e.g.: ghc -c -O +-M1024m Foo.hs. + + + +Increasing the default allocation area size used by the compiler's RTS +might also help: use the -A<size>-A<size> option +option. + + + +If GHC persists in being a bad memory citizen, please report it as a +bug. + + + + +Don't use too much memory! + + +As soon as GHC plus its ``fellow citizens'' (other processes on your +machine) start using more than the real memory on your +machine, and the machine starts ``thrashing,'' the party is +over. Compile times will be worse than terrible! Use something +like the csh-builtin time command to get a report on how many page +faults you're getting. + + + +If you don't know what virtual memory, thrashing, and page faults are, +or you don't know the memory configuration of your machine, +don't try to be clever about memory use: you'll just make +your life a misery (and for other people, too, probably). + + + + +Try to use local disks when linking: + + +Because Haskell objects and libraries tend to be large, it can take +many real seconds to slurp the bits to/from a remote filesystem. + + + +It would be quite sensible to compile on a fast machine using +remotely-mounted disks; then link on a slow machine that had +your disks directly mounted. + + + + +Don't derive/use Read unnecessarily: + + +It's ugly and slow. + + + + +GHC compiles some program constructs slowly: + + +Deeply-nested list comprehensions seem to be one such; in the past, +very large constant tables were bad, too. + + + +We'd rather you reported such behaviour as a bug, so that we can try +to correct it. + + + +The parts of the compiler that seem most prone to wandering off for a +long time are the abstract interpreters (strictness and update +analysers). You can turn these off individually with +-fno-strictness-fno-strictness anti-option and +-fno-update-analysis.-fno-update-analysis anti-option + + + +To figure out which part of the compiler is badly behaved, the +-dshow-passes-dshow-passes option option is your +friend. + + + +If your module has big wads of constant data, GHC may produce a huge +basic block that will cause the native-code generator's register +allocator to founder. Bring on -fvia-C-fvia-C option +(not that GCC will be that quick about it, either). + + + + +Avoid the consistency-check on linking: + + +Use -no-link-chk-no-link-chk; saves effort. This is +probably safe in a I-only-compile-things-one-way setup. + + + + +Explicit import declarations: + + +Instead of saying import Foo, say import Foo (...stuff I want...). + + + +Truthfully, the reduction on compilation time will be very small. +However, judicious use of import declarations can make a +program easier to understand, so it may be a good idea anyway. + + + + + + + + + +Faster: producing a program that runs quicker + + + +faster programs, how to produce + + + +The key tool to use in making your Haskell program run faster are +GHC's profiling facilities, described separately in . There is no substitute for +finding where your program's time/space is really going, as +opposed to where you imagine it is going. + + + +Another point to bear in mind: By far the best way to improve a +program's performance dramatically is to use better +algorithms. Once profiling has thrown the spotlight on the guilty +time-consumer(s), it may be better to re-think your program than to +try all the tweaks listed below. + + + +Another extremely efficient way to make your program snappy is to use +library code that has been Seriously Tuned By Someone Else. You +might be able to write a better quicksort than the one in the +HBC library, but it will take you much longer than typing import +QSort. (Incidentally, it doesn't hurt if the Someone Else is Lennart +Augustsson.) + + + +Please report any overly-slow GHC-compiled programs. The current +definition of ``overly-slow'' is ``the HBC-compiled version ran +faster''… + + + + + + +Optimise, using -O or -O2: + + +This is the most basic way +to make your program go faster. Compilation time will be slower, +especially with -O2. + + + +At present, -O2 is nearly indistinguishable from -O. + + + + +Compile via C and crank up GCC: + + +Even with -O, GHC tries to +use a native-code generator, if available. But the native +code-generator is designed to be quick, not mind-bogglingly clever. +Better to let GCC have a go, as it tries much harder on register +allocation, etc. + + + +So, when we want very fast code, we use: -O -fvia-C -O2-for-C. + + + + +Overloaded functions are not your friend: + + +Haskell's overloading (using type classes) is elegant, neat, etc., +etc., but it is death to performance if left to linger in an inner +loop. How can you squash it? + + + + + + +Give explicit type signatures: + + +Signatures are the basic trick; putting them on exported, top-level +functions is good software-engineering practice, anyway. (Tip: using +-fwarn-missing-signatures-fwarn-missing-signatures +option can help enforce good signature-practice). + + + +The automatic specialisation of overloaded functions (with -O) +should take care of overloaded local and/or unexported functions. + + + + +Use SPECIALIZE pragmas: + + +SPECIALIZE pragma +overloading, death to + + + +Specialize the overloading on key functions in your program. See + and +. + + + + +``But how do I know where overloading is creeping in?'': + + +A low-tech way: grep (search) your interface files for overloaded +type signatures; e.g.,: + + +% egrep '^[a-z].*::.*=>' *.hi + + + + + + + + + + +Strict functions are your dear friends: + + +and, among other things, lazy pattern-matching is your enemy. + + + +(If you don't know what a ``strict function'' is, please consult a +functional-programming textbook. A sentence or two of +explanation here probably would not do much good.) + + + +Consider these two code fragments: + + +f (Wibble x y) = ... # strict + +f arg = let { (Wibble x y) = arg } in ... # lazy + + +The former will result in far better code. + + + +A less contrived example shows the use of cases instead +of lets to get stricter code (a good thing): + + +f (Wibble x y) # beautiful but slow + = let + (a1, b1, c1) = unpackFoo x + (a2, b2, c2) = unpackFoo y + in ... + +f (Wibble x y) # ugly, and proud of it + = case (unpackFoo x) of { (a1, b1, c1) -> + case (unpackFoo y) of { (a2, b2, c2) -> + ... + }} + + + + + + +GHC loves single-constructor data-types: + + +It's all the better if a function is strict in a single-constructor +type (a type with only one data-constructor; for example, tuples are +single-constructor types). + + + + +Newtypes are better than datatypes: + + +If your datatype has a single constructor with a single field, use a +newtype declaration instead of a data declaration. The newtype +will be optimised away in most cases. + + + + +``How do I find out a function's strictness?'' + + +Don't guess—look it up. + + + +Look for your function in the interface file, then for the third field +in the pragma; it should say _S_ <string>. The <string> +gives the strictness of the function's arguments. L is lazy +(bad), S and E are strict (good), P is ``primitive'' (good), +U(...) is strict and +``unpackable'' (very good), and A is absent (very good). + + + +For an ``unpackable'' U(...) argument, the info inside +tells the strictness of its components. So, if the argument is a +pair, and it says U(AU(LSS)), that means ``the first component of the +pair isn't used; the second component is itself unpackable, with three +components (lazy in the first, strict in the second \& third).'' + + + +If the function isn't exported, just compile with the extra flag -ddump-simpl; +next to the signature for any binder, it will print the self-same +pragmatic information as would be put in an interface file. +(Besides, Core syntax is fun to look at!) + + + + +Force key functions to be INLINEd (esp. monads): + + +Placing INLINE pragmas on certain functions that are used a lot can +have a dramatic effect. See . + + + + +Explicit export list: + + +If you do not have an explicit export list in a module, GHC must +assume that everything in that module will be exported. This has +various pessimising effects. For example, if a bit of code is actually +unused (perhaps because of unfolding effects), GHC will not be +able to throw it away, because it is exported and some other module +may be relying on its existence. + + + +GHC can be quite a bit more aggressive with pieces of code if it knows +they are not exported. + + + + +Look at the Core syntax! + + +(The form in which GHC manipulates your code.) Just run your +compilation with -ddump-simpl (don't forget the -O). + + + +If profiling has pointed the finger at particular functions, look at +their Core code. lets are bad, cases are good, dictionaries +(d.<Class>.<Unique>) [or anything overloading-ish] are bad, +nested lambdas are bad, explicit data constructors are good, primitive +operations (e.g., eqInt#) are good,… + + + + +Use unboxed types (a GHC extension): + + +When you are really desperate for speed, and you want to get +right down to the ``raw bits.'' Please see for some information about using unboxed +types. + + + + +Use _ccall_s (a GHC extension) to plug into fast libraries: + + +This may take real work, but… There exist piles of +massively-tuned library code, and the best thing is not +to compete with it, but link with it. + + + + says a little about how to use C calls. + + + + +Don't use Floats: + + +We don't provide specialisations of Prelude functions for Float +(but we do for Double). If you end up executing overloaded +code, you will lose on performance, perhaps badly. + + + +Floats (probably 32-bits) are almost always a bad idea, anyway, +unless you Really Know What You Are Doing. Use Doubles. There's +rarely a speed disadvantage—modern machines will use the same +floating-point unit for both. With Doubles, you are much less +likely to hang yourself with numerical errors. + + + +One time when Float might be a good idea is if you have a +lot of them, say a giant array of Floats. They take up +half the space in the heap compared to Doubles. However, this isn't +true on a 64-bit machine. + + + + +Use a bigger heap! + + +If your program's GC stats (-S-S RTS option RTS option) +indicate that it's doing lots of garbage-collection (say, more than +20% of execution time), more memory might help—with the +-M<size>-M<size> RTS option or +-A<size>-A<size> RTS option RTS options (see +). + + + + + + + + + +Smaller: producing a program that is smaller + + + +smaller programs, how to produce + + + +Decrease the ``go-for-it'' threshold for unfolding smallish +expressions. Give a +-funfolding-use-threshold0-funfolding-use-threshold0 +option option for the extreme case. (``Only unfoldings with +zero cost should proceed.'') Warning: except in certain specialiised +cases (like Happy parsers) this is likely to actually +increase the size of your program, because unfolding +generally enables extra simplifying optimisations to be performed. + + + +Avoid Read. + + + +Use strip on your executables. + + + + + +Stingier: producing a program that gobbles less heap space + + + +memory, using less heap +space-leaks, avoiding +heap space, using less + + + +``I think I have a space leak…'' Re-run your program with ++RTS -Sstderr,-Sstderr RTS option and remove all doubt! +(You'll see the heap usage get bigger and bigger…) [Hmmm…this +might be even easier with the -F2s-F2s RTS option RTS +option; so… ./a.out +RTS -Sstderr -F2s...] + + + +Once again, the profiling facilities () are the basic tool for demystifying the space +behaviour of your program. + + + +Strict functions are good for space usage, as they are for time, as +discussed in the previous section. Strict functions get right down to +business, rather than filling up the heap with closures (the system's +notes to itself about how to evaluate something, should it eventually +be required). + + + + +