1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <chapter id="profiling">
3 <title>Profiling</title>
4 <indexterm><primary>profiling</primary>
6 <indexterm><primary>cost-centre profiling</primary></indexterm>
8 <para> Glasgow Haskell comes with a time and space profiling
9 system. Its purpose is to help you improve your understanding of
10 your program's execution behaviour, so you can improve it.</para>
12 <para> Any comments, suggestions and/or improvements you have are
13 welcome. Recommended “profiling tricks” would be
14 especially cool! </para>
16 <para>Profiling a program is a three-step process:</para>
20 <para> Re-compile your program for profiling with the
21 <literal>-prof</literal> option, and probably one of the
22 <literal>-auto</literal> or <literal>-auto-all</literal>
23 options. These options are described in more detail in <xref
24 linkend="prof-compiler-options"/> </para>
25 <indexterm><primary><literal>-prof</literal></primary>
27 <indexterm><primary><literal>-auto</literal></primary>
29 <indexterm><primary><literal>-auto-all</literal></primary>
34 <para> Run your program with one of the profiling options, eg.
35 <literal>+RTS -p -RTS</literal>. This generates a file of
36 profiling information. Note that multi-processor execution
37 (e.g. <literal>+RTS -N2</literal>) is not supported while
39 <indexterm><primary><option>-p</option></primary><secondary>RTS
40 option</secondary></indexterm>
44 <para> Examine the generated profiling information, using one of
45 GHC's profiling tools. The tool to use will depend on the kind
46 of profiling information generated.</para>
51 <sect1 id="cost-centres">
52 <title>Cost centres and cost-centre stacks</title>
54 <para>GHC's profiling system assigns <firstterm>costs</firstterm>
55 to <firstterm>cost centres</firstterm>. A cost is simply the time
56 or space required to evaluate an expression. Cost centres are
57 program annotations around expressions; all costs incurred by the
58 annotated expression are assigned to the enclosing cost centre.
59 Furthermore, GHC will remember the stack of enclosing cost centres
60 for any given expression at run-time and generate a call-graph of
61 cost attributions.</para>
63 <para>Let's take a look at an example:</para>
66 main = print (nfib 25)
67 nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
70 <para>Compile and run this program as follows:</para>
73 $ ghc -prof -auto-all -o Main Main.hs
79 <para>When a GHC-compiled program is run with the
80 <option>-p</option> RTS option, it generates a file called
81 <filename><prog>.prof</filename>. In this case, the file
82 will contain something like this:</para>
85 Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final)
89 total time = 0.14 secs (7 ticks @ 20 ms)
90 total alloc = 8,741,204 bytes (excludes profiling overheads)
92 COST CENTRE MODULE %time %alloc
98 COST CENTRE MODULE entries %time %alloc %time %alloc
100 MAIN MAIN 0 0.0 0.0 100.0 100.0
101 main Main 0 0.0 0.0 0.0 0.0
102 CAF PrelHandle 3 0.0 0.0 0.0 0.0
103 CAF PrelAddr 1 0.0 0.0 0.0 0.0
104 CAF Main 6 0.0 0.0 100.0 100.0
105 main Main 1 0.0 0.0 100.0 100.0
106 nfib Main 242785 100.0 100.0 100.0 100.0
110 <para>The first part of the file gives the program name and
111 options, and the total time and total memory allocation measured
112 during the run of the program (note that the total memory
113 allocation figure isn't the same as the amount of
114 <emphasis>live</emphasis> memory needed by the program at any one
115 time; the latter can be determined using heap profiling, which we
116 will describe shortly).</para>
118 <para>The second part of the file is a break-down by cost centre
119 of the most costly functions in the program. In this case, there
120 was only one significant function in the program, namely
121 <function>nfib</function>, and it was responsible for 100%
122 of both the time and allocation costs of the program.</para>
124 <para>The third and final section of the file gives a profile
125 break-down by cost-centre stack. This is roughly a call-graph
126 profile of the program. In the example above, it is clear that
127 the costly call to <function>nfib</function> came from
128 <function>main</function>.</para>
130 <para>The time and allocation incurred by a given part of the
131 program is displayed in two ways: “individual”, which
132 are the costs incurred by the code covered by this cost centre
133 stack alone, and “inherited”, which includes the costs
134 incurred by all the children of this node.</para>
136 <para>The usefulness of cost-centre stacks is better demonstrated
137 by modifying the example slightly:</para>
140 main = print (f 25 + g 25)
142 g n = nfib (n `div` 2)
143 nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
146 <para>Compile and run this program as before, and take a look at
147 the new profiling results:</para>
150 COST CENTRE MODULE scc %time %alloc %time %alloc
152 MAIN MAIN 0 0.0 0.0 100.0 100.0
153 main Main 0 0.0 0.0 0.0 0.0
154 CAF PrelHandle 3 0.0 0.0 0.0 0.0
155 CAF PrelAddr 1 0.0 0.0 0.0 0.0
156 CAF Main 9 0.0 0.0 100.0 100.0
157 main Main 1 0.0 0.0 100.0 100.0
158 g Main 1 0.0 0.0 0.0 0.2
159 nfib Main 465 0.0 0.2 0.0 0.2
160 f Main 1 0.0 0.0 100.0 99.8
161 nfib Main 242785 100.0 99.8 100.0 99.8
164 <para>Now although we had two calls to <function>nfib</function>
165 in the program, it is immediately clear that it was the call from
166 <function>f</function> which took all the time.</para>
168 <para>The actual meaning of the various columns in the output is:</para>
174 <para>The number of times this particular point in the call
175 graph was entered.</para>
180 <term>individual %time</term>
182 <para>The percentage of the total run time of the program
183 spent at this point in the call graph.</para>
188 <term>individual %alloc</term>
190 <para>The percentage of the total memory allocations
191 (excluding profiling overheads) of the program made by this
197 <term>inherited %time</term>
199 <para>The percentage of the total run time of the program
200 spent below this point in the call graph.</para>
205 <term>inherited %alloc</term>
207 <para>The percentage of the total memory allocations
208 (excluding profiling overheads) of the program made by this
209 call and all of its sub-calls.</para>
214 <para>In addition you can use the <option>-P</option> RTS option
215 <indexterm><primary><option>-P</option></primary></indexterm> to
216 get the following additional information:</para>
220 <term><literal>ticks</literal></term>
222 <para>The raw number of time “ticks” which were
223 attributed to this cost-centre; from this, we get the
224 <literal>%time</literal> figure mentioned
230 <term><literal>bytes</literal></term>
232 <para>Number of bytes allocated in the heap while in this
233 cost-centre; again, this is the raw number from which we get
234 the <literal>%alloc</literal> figure mentioned
240 <para>What about recursive functions, and mutually recursive
241 groups of functions? Where are the costs attributed? Well,
242 although GHC does keep information about which groups of functions
243 called each other recursively, this information isn't displayed in
244 the basic time and allocation profile, instead the call-graph is
245 flattened into a tree.</para>
247 <sect2><title>Inserting cost centres by hand</title>
249 <para>Cost centres are just program annotations. When you say
250 <option>-auto-all</option> to the compiler, it automatically
251 inserts a cost centre annotation around every top-level function
252 not marked INLINE in your program, but you are entirely free to
253 add the cost centre annotations yourself.</para>
255 <para>The syntax of a cost centre annotation is</para>
258 {-# SCC "name" #-} <expression>
261 <para>where <literal>"name"</literal> is an arbitrary string,
262 that will become the name of your cost centre as it appears
263 in the profiling output, and
264 <literal><expression></literal> is any Haskell
265 expression. An <literal>SCC</literal> annotation extends as
266 far to the right as possible when parsing. (SCC stands for "Set
267 Cost Centre").</para>
269 <para>Here is an example of a program with a couple of SCCs:</para>
273 main = do let xs = {-# SCC "X" #-} [1..1000000]
274 let ys = {-# SCC "Y" #-} [1..2000000]
276 print $ last $ init xs
278 print $ last $ init ys
281 <para>which gives this heap profile when run:</para>
283 <!-- contentwidth/contentheight don't appear to have any effect
284 other than making the PS file generation work, rather than
285 falling over. The result seems to be broken PS on the page
287 <imagedata fileref="prof_scc" contentwidth="645px"
288 contentdepth="428px"/>
292 <sect2 id="prof-rules">
293 <title>Rules for attributing costs</title>
295 <para>The cost of evaluating any expression in your program is
296 attributed to a cost-centre stack using the following rules:</para>
300 <para>If the expression is part of the
301 <firstterm>one-off</firstterm> costs of evaluating the
302 enclosing top-level definition, then costs are attributed to
303 the stack of lexically enclosing <literal>SCC</literal>
304 annotations on top of the special <literal>CAF</literal>
309 <para>Otherwise, costs are attributed to the stack of
310 lexically-enclosing <literal>SCC</literal> annotations,
311 appended to the cost-centre stack in effect at the
312 <firstterm>call site</firstterm> of the current top-level
313 definition<footnote> <para>The call-site is just the place
314 in the source code which mentions the particular function or
315 variable.</para></footnote>. Notice that this is a recursive
320 <para>Time spent in foreign code (see <xref linkend="ffi"/>)
321 is always attributed to the cost centre in force at the
322 Haskell call-site of the foreign function.</para>
326 <para>What do we mean by one-off costs? Well, Haskell is a lazy
327 language, and certain expressions are only ever evaluated once.
328 For example, if we write:</para>
334 <para>then <varname>x</varname> will only be evaluated once (if
335 at all), and subsequent demands for <varname>x</varname> will
336 immediately get to see the cached result. The definition
337 <varname>x</varname> is called a CAF (Constant Applicative
338 Form), because it has no arguments.</para>
340 <para>For the purposes of profiling, we say that the expression
341 <literal>nfib 25</literal> belongs to the one-off costs of
342 evaluating <varname>x</varname>.</para>
344 <para>Since one-off costs aren't strictly speaking part of the
345 call-graph of the program, they are attributed to a special
346 top-level cost centre, <literal>CAF</literal>. There may be one
347 <literal>CAF</literal> cost centre for each module (the
348 default), or one for each top-level definition with any one-off
349 costs (this behaviour can be selected by giving GHC the
350 <option>-caf-all</option> flag).</para>
352 <indexterm><primary><literal>-caf-all</literal></primary>
355 <para>If you think you have a weird profile, or the call-graph
356 doesn't look like you expect it to, feel free to send it (and
357 your program) to us at
358 <email>glasgow-haskell-bugs@haskell.org</email>.</para>
362 <sect1 id="prof-compiler-options">
363 <title>Compiler options for profiling</title>
365 <indexterm><primary>profiling</primary><secondary>options</secondary></indexterm>
366 <indexterm><primary>options</primary><secondary>for profiling</secondary></indexterm>
371 <option>-prof</option>:
372 <indexterm><primary><option>-prof</option></primary></indexterm>
375 <para> To make use of the profiling system
376 <emphasis>all</emphasis> modules must be compiled and linked
377 with the <option>-prof</option> option. Any
378 <literal>SCC</literal> annotations you've put in your source
379 will spring to life.</para>
381 <para> Without a <option>-prof</option> option, your
382 <literal>SCC</literal>s are ignored; so you can compile
383 <literal>SCC</literal>-laden code without changing
389 <para>There are a few other profiling-related compilation options.
390 Use them <emphasis>in addition to</emphasis>
391 <option>-prof</option>. These do not have to be used consistently
392 for all modules in a program.</para>
397 <option>-auto</option>:
398 <indexterm><primary><option>-auto</option></primary></indexterm>
399 <indexterm><primary>cost centres</primary><secondary>automatically inserting</secondary></indexterm>
402 <para> GHC will automatically add
403 <function>_scc_</function> constructs for all
404 top-level, exported functions not marked INLINE. If you
405 want a cost centre on an INLINE function, you have to add
412 <option>-auto-all</option>:
413 <indexterm><primary><option>-auto-all</option></primary></indexterm>
416 <para> <emphasis>All</emphasis> top-level functions
417 not marked INLINE, exported or not, will be automatically
418 <function>_scc_</function>'d.
419 The functions marked INLINE must be given a cost centre
426 <option>-caf-all</option>:
427 <indexterm><primary><option>-caf-all</option></primary></indexterm>
430 <para> The costs of all CAFs in a module are usually
431 attributed to one “big” CAF cost-centre. With
432 this option, all CAFs get their own cost-centre. An
433 “if all else fails” option…</para>
439 <option>-ignore-scc</option>:
440 <indexterm><primary><option>-ignore-scc</option></primary></indexterm>
443 <para>Ignore any <function>_scc_</function>
444 constructs, so a module which already has
445 <function>_scc_</function>s can be compiled
446 for profiling with the annotations ignored.</para>
454 <sect1 id="prof-time-options">
455 <title>Time and allocation profiling</title>
457 <para>To generate a time and allocation profile, give one of the
458 following RTS options to the compiled program when you run it (RTS
459 options should be enclosed between <literal>+RTS...-RTS</literal>
465 <option>-p</option> or <option>-P</option> or <option>-pa</option>:
466 <indexterm><primary><option>-p</option></primary></indexterm>
467 <indexterm><primary><option>-P</option></primary></indexterm>
468 <indexterm><primary><option>-pa</option></primary></indexterm>
469 <indexterm><primary>time profile</primary></indexterm>
472 <para>The <option>-p</option> option produces a standard
473 <emphasis>time profile</emphasis> report. It is written
475 <filename><replaceable>program</replaceable>.prof</filename>.</para>
477 <para>The <option>-P</option> option produces a more
478 detailed report containing the actual time and allocation
479 data as well. (Not used much.)</para>
481 <para>The <option>-pa</option> option produces the most detailed
482 report containing all cost centres in addition to the actual time
483 and allocation data.</para>
488 <term><option>-V<replaceable>secs</replaceable></option>
489 <indexterm><primary><option>-V</option></primary><secondary>RTS
490 option</secondary></indexterm></term>
492 <para>Sets the interval that the RTS clock ticks at, which is
493 also the sampling interval of the time and allocation profile.
494 The default is 0.02 second.</para>
501 <indexterm><primary><option>-xc</option></primary><secondary>RTS option</secondary></indexterm>
504 <para>This option makes use of the extra information
505 maintained by the cost-centre-stack profiler to provide
506 useful information about the location of runtime errors.
507 See <xref linkend="rts-options-debugging"/>.</para>
515 <sect1 id="prof-heap">
516 <title>Profiling memory usage</title>
518 <para>In addition to profiling the time and allocation behaviour
519 of your program, you can also generate a graph of its memory usage
520 over time. This is useful for detecting the causes of
521 <firstterm>space leaks</firstterm>, when your program holds on to
522 more memory at run-time that it needs to. Space leaks lead to
523 longer run-times due to heavy garbage collector activity, and may
524 even cause the program to run out of memory altogether.</para>
526 <para>To generate a heap profile from your program:</para>
530 <para>Compile the program for profiling (<xref
531 linkend="prof-compiler-options"/>).</para>
534 <para>Run it with one of the heap profiling options described
535 below (eg. <option>-hc</option> for a basic producer profile).
536 This generates the file
537 <filename><replaceable>prog</replaceable>.hp</filename>.</para>
540 <para>Run <command>hp2ps</command> to produce a Postscript
542 <filename><replaceable>prog</replaceable>.ps</filename>. The
543 <command>hp2ps</command> utility is described in detail in
544 <xref linkend="hp2ps"/>.</para>
547 <para>Display the heap profile using a postscript viewer such
548 as <application>Ghostview</application>, or print it out on a
549 Postscript-capable printer.</para>
553 <para>You might also want to take a look
554 at <ulink url="http://www.haskell.org/haskellwiki/Hp2any">hp2any</ulink>,
555 a more advanced suite of tools (not distributed with GHC) for
556 displaying heap profiles.</para>
558 <sect2 id="rts-options-heap-prof">
559 <title>RTS options for heap profiling</title>
561 <para>There are several different kinds of heap profile that can
562 be generated. All the different profile types yield a graph of
563 live heap against time, but they differ in how the live heap is
564 broken down into bands. The following RTS options select which
565 break-down to use:</para>
571 <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
574 <para>Breaks down the graph by the cost-centre stack which
575 produced the data.</para>
582 <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
585 <para>Break down the live heap by the module containing
586 the code which produced the data.</para>
593 <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
596 <para>Breaks down the graph by <firstterm>closure
597 description</firstterm>. For actual data, the description
598 is just the constructor name, for other closures it is a
599 compiler-generated string identifying the closure.</para>
606 <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
609 <para>Breaks down the graph by
610 <firstterm>type</firstterm>. For closures which have
611 function type or unknown/polymorphic type, the string will
612 represent an approximation to the actual type.</para>
619 <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
622 <para>Break down the graph by <firstterm>retainer
623 set</firstterm>. Retainer profiling is described in more
624 detail below (<xref linkend="retainer-prof"/>).</para>
631 <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
634 <para>Break down the graph by
635 <firstterm>biography</firstterm>. Biographical profiling
636 is described in more detail below (<xref
637 linkend="biography-prof"/>).</para>
642 <para>In addition, the profile can be restricted to heap data
643 which satisfies certain criteria - for example, you might want
644 to display a profile by type but only for data produced by a
645 certain module, or a profile by retainer for a certain type of
646 data. Restrictions are specified as follows:</para>
651 <option>-hc</option><replaceable>name</replaceable>,...
652 <indexterm><primary><option>-hc</option></primary><secondary>RTS option</secondary></indexterm>
655 <para>Restrict the profile to closures produced by
656 cost-centre stacks with one of the specified cost centres
663 <option>-hC</option><replaceable>name</replaceable>,...
664 <indexterm><primary><option>-hC</option></primary><secondary>RTS option</secondary></indexterm>
667 <para>Restrict the profile to closures produced by
668 cost-centre stacks with one of the specified cost centres
669 anywhere in the stack.</para>
675 <option>-hm</option><replaceable>module</replaceable>,...
676 <indexterm><primary><option>-hm</option></primary><secondary>RTS option</secondary></indexterm>
679 <para>Restrict the profile to closures produced by the
680 specified modules.</para>
686 <option>-hd</option><replaceable>desc</replaceable>,...
687 <indexterm><primary><option>-hd</option></primary><secondary>RTS option</secondary></indexterm>
690 <para>Restrict the profile to closures with the specified
691 description strings.</para>
697 <option>-hy</option><replaceable>type</replaceable>,...
698 <indexterm><primary><option>-hy</option></primary><secondary>RTS option</secondary></indexterm>
701 <para>Restrict the profile to closures with the specified
708 <option>-hr</option><replaceable>cc</replaceable>,...
709 <indexterm><primary><option>-hr</option></primary><secondary>RTS option</secondary></indexterm>
712 <para>Restrict the profile to closures with retainer sets
713 containing cost-centre stacks with one of the specified
714 cost centres at the top.</para>
720 <option>-hb</option><replaceable>bio</replaceable>,...
721 <indexterm><primary><option>-hb</option></primary><secondary>RTS option</secondary></indexterm>
724 <para>Restrict the profile to closures with one of the
725 specified biographies, where
726 <replaceable>bio</replaceable> is one of
727 <literal>lag</literal>, <literal>drag</literal>,
728 <literal>void</literal>, or <literal>use</literal>.</para>
733 <para>For example, the following options will generate a
734 retainer profile restricted to <literal>Branch</literal> and
735 <literal>Leaf</literal> constructors:</para>
738 <replaceable>prog</replaceable> +RTS -hr -hdBranch,Leaf
741 <para>There can only be one "break-down" option
742 (eg. <option>-hr</option> in the example above), but there is no
743 limit on the number of further restrictions that may be applied.
744 All the options may be combined, with one exception: GHC doesn't
745 currently support mixing the <option>-hr</option> and
746 <option>-hb</option> options.</para>
748 <para>There are three more options which relate to heap
754 <option>-i<replaceable>secs</replaceable></option>:
755 <indexterm><primary><option>-i</option></primary></indexterm>
758 <para>Set the profiling (sampling) interval to
759 <replaceable>secs</replaceable> seconds (the default is
760 0.1 second). Fractions are allowed: for example
761 <option>-i0.2</option> will get 5 samples per second.
762 This only affects heap profiling; time profiles are always
763 sampled with the frequency of the RTS clock. See
764 <xref linkend="prof-time-options"/> for changing that.</para>
771 <indexterm><primary><option>-xt</option></primary><secondary>RTS option</secondary></indexterm>
774 <para>Include the memory occupied by threads in a heap
775 profile. Each thread takes up a small area for its thread
776 state in addition to the space allocated for its stack
777 (stacks normally start small and then grow as
780 <para>This includes the main thread, so using
781 <option>-xt</option> is a good way to see how much stack
782 space the program is using.</para>
784 <para>Memory occupied by threads and their stacks is
785 labelled as “TSO” when displaying the profile
786 by closure description or type description.</para>
792 <option>-L<replaceable>num</replaceable></option>
793 <indexterm><primary><option>-L</option></primary><secondary>RTS option</secondary></indexterm>
797 Sets the maximum length of a cost-centre stack name in a
798 heap profile. Defaults to 25.
806 <sect2 id="retainer-prof">
807 <title>Retainer Profiling</title>
809 <para>Retainer profiling is designed to help answer questions
810 like <quote>why is this data being retained?</quote>. We start
811 by defining what we mean by a retainer:</para>
814 <para>A retainer is either the system stack, or an unevaluated
815 closure (thunk).</para>
818 <para>In particular, constructors are <emphasis>not</emphasis>
821 <para>An object B retains object A if (i) B is a retainer object and
822 (ii) object A can be reached by recursively following pointers
823 starting from object B, but not meeting any other retainer
824 objects on the way. Each live object is retained by one or more
825 retainer objects, collectively called its retainer set, or its
826 <firstterm>retainer set</firstterm>, or its
827 <firstterm>retainers</firstterm>.</para>
829 <para>When retainer profiling is requested by giving the program
830 the <option>-hr</option> option, a graph is generated which is
831 broken down by retainer set. A retainer set is displayed as a
832 set of cost-centre stacks; because this is usually too large to
833 fit on the profile graph, each retainer set is numbered and
834 shown abbreviated on the graph along with its number, and the
835 full list of retainer sets is dumped into the file
836 <filename><replaceable>prog</replaceable>.prof</filename>.</para>
838 <para>Retainer profiling requires multiple passes over the live
839 heap in order to discover the full retainer set for each
840 object, which can be quite slow. So we set a limit on the
841 maximum size of a retainer set, where all retainer sets larger
842 than the maximum retainer set size are replaced by the special
843 set <literal>MANY</literal>. The maximum set size defaults to 8
844 and can be altered with the <option>-R</option> RTS
849 <term><option>-R</option><replaceable>size</replaceable></term>
851 <para>Restrict the number of elements in a retainer set to
852 <replaceable>size</replaceable> (default 8).</para>
858 <title>Hints for using retainer profiling</title>
860 <para>The definition of retainers is designed to reflect a
861 common cause of space leaks: a large structure is retained by
862 an unevaluated computation, and will be released once the
863 computation is forced. A good example is looking up a value in
864 a finite map, where unless the lookup is forced in a timely
865 manner the unevaluated lookup will cause the whole mapping to
866 be retained. These kind of space leaks can often be
867 eliminated by forcing the relevant computations to be
868 performed eagerly, using <literal>seq</literal> or strictness
869 annotations on data constructor fields.</para>
871 <para>Often a particular data structure is being retained by a
872 chain of unevaluated closures, only the nearest of which will
873 be reported by retainer profiling - for example A retains B, B
874 retains C, and C retains a large structure. There might be a
875 large number of Bs but only a single A, so A is really the one
876 we're interested in eliminating. However, retainer profiling
877 will in this case report B as the retainer of the large
878 structure. To move further up the chain of retainers, we can
879 ask for another retainer profile but this time restrict the
880 profile to B objects, so we get a profile of the retainers of
884 <replaceable>prog</replaceable> +RTS -hr -hcB
887 <para>This trick isn't foolproof, because there might be other
888 B closures in the heap which aren't the retainers we are
889 interested in, but we've found this to be a useful technique
890 in most cases.</para>
894 <sect2 id="biography-prof">
895 <title>Biographical Profiling</title>
897 <para>A typical heap object may be in one of the following four
898 states at each point in its lifetime:</para>
902 <para>The <firstterm>lag</firstterm> stage, which is the
903 time between creation and the first use of the
907 <para>the <firstterm>use</firstterm> stage, which lasts from
908 the first use until the last use of the object, and</para>
911 <para>The <firstterm>drag</firstterm> stage, which lasts
912 from the final use until the last reference to the object
916 <para>An object which is never used is said to be in the
917 <firstterm>void</firstterm> state for its whole
922 <para>A biographical heap profile displays the portion of the
923 live heap in each of the four states listed above. Usually the
924 most interesting states are the void and drag states: live heap
925 in these states is more likely to be wasted space than heap in
926 the lag or use states.</para>
928 <para>It is also possible to break down the heap in one or more
929 of these states by a different criteria, by restricting a
930 profile by biography. For example, to show the portion of the
931 heap in the drag or void state by producer: </para>
934 <replaceable>prog</replaceable> +RTS -hc -hbdrag,void
937 <para>Once you know the producer or the type of the heap in the
938 drag or void states, the next step is usually to find the
942 <replaceable>prog</replaceable> +RTS -hr -hc<replaceable>cc</replaceable>...
945 <para>NOTE: this two stage process is required because GHC
946 cannot currently profile using both biographical and retainer
947 information simultaneously.</para>
950 <sect2 id="mem-residency">
951 <title>Actual memory residency</title>
953 <para>How does the heap residency reported by the heap profiler relate to
954 the actual memory residency of your program when you run it? You might
955 see a large discrepancy between the residency reported by the heap
956 profiler, and the residency reported by tools on your system
957 (eg. <literal>ps</literal> or <literal>top</literal> on Unix, or the
958 Task Manager on Windows). There are several reasons for this:</para>
962 <para>There is an overhead of profiling itself, which is subtracted
963 from the residency figures by the profiler. This overhead goes
964 away when compiling without profiling support, of course. The
965 space overhead is currently 2 extra
966 words per heap object, which probably results in
967 about a 30% overhead.</para>
971 <para>Garbage collection requires more memory than the actual
972 residency. The factor depends on the kind of garbage collection
973 algorithm in use: a major GC in the standard
974 generation copying collector will usually require 3L bytes of
975 memory, where L is the amount of live data. This is because by
976 default (see the <option>+RTS -F</option> option) we allow the old
977 generation to grow to twice its size (2L) before collecting it, and
978 we require additionally L bytes to copy the live data into. When
979 using compacting collection (see the <option>+RTS -c</option>
980 option), this is reduced to 2L, and can further be reduced by
981 tweaking the <option>-F</option> option. Also add the size of the
982 allocation area (currently a fixed 512Kb).</para>
986 <para>The stack isn't counted in the heap profile by default. See the
987 <option>+RTS -xt</option> option.</para>
991 <para>The program text itself, the C stack, any non-heap data (eg. data
992 allocated by foreign libraries, and data allocated by the RTS), and
993 <literal>mmap()</literal>'d memory are not counted in the heap profile.</para>
1001 <title><command>hp2ps</command>––heap profile to PostScript</title>
1003 <indexterm><primary><command>hp2ps</command></primary></indexterm>
1004 <indexterm><primary>heap profiles</primary></indexterm>
1005 <indexterm><primary>postscript, from heap profiles</primary></indexterm>
1006 <indexterm><primary><option>-h<break-down></option></primary></indexterm>
1011 hp2ps [flags] [<file>[.hp]]
1015 <command>hp2ps</command><indexterm><primary>hp2ps
1016 program</primary></indexterm> converts a heap profile as produced
1017 by the <option>-h<break-down></option> runtime option into a
1018 PostScript graph of the heap profile. By convention, the file to
1019 be processed by <command>hp2ps</command> has a
1020 <filename>.hp</filename> extension. The PostScript output is
1021 written to <filename><file>@.ps</filename>. If
1022 <filename><file></filename> is omitted entirely, then the
1023 program behaves as a filter.</para>
1025 <para><command>hp2ps</command> is distributed in
1026 <filename>ghc/utils/hp2ps</filename> in a GHC source
1027 distribution. It was originally developed by Dave Wakeling as part
1028 of the HBC/LML heap profiler.</para>
1030 <para>The flags are:</para>
1035 <term><option>-d</option></term>
1037 <para>In order to make graphs more readable,
1038 <command>hp2ps</command> sorts the shaded bands for each
1039 identifier. The default sort ordering is for the bands with
1040 the largest area to be stacked on top of the smaller ones.
1041 The <option>-d</option> option causes rougher bands (those
1042 representing series of values with the largest standard
1043 deviations) to be stacked on top of smoother ones.</para>
1048 <term><option>-b</option></term>
1050 <para>Normally, <command>hp2ps</command> puts the title of
1051 the graph in a small box at the top of the page. However, if
1052 the JOB string is too long to fit in a small box (more than
1053 35 characters), then <command>hp2ps</command> will choose to
1054 use a big box instead. The <option>-b</option> option
1055 forces <command>hp2ps</command> to use a big box.</para>
1060 <term><option>-e<float>[in|mm|pt]</option></term>
1062 <para>Generate encapsulated PostScript suitable for
1063 inclusion in LaTeX documents. Usually, the PostScript graph
1064 is drawn in landscape mode in an area 9 inches wide by 6
1065 inches high, and <command>hp2ps</command> arranges for this
1066 area to be approximately centred on a sheet of a4 paper.
1067 This format is convenient of studying the graph in detail,
1068 but it is unsuitable for inclusion in LaTeX documents. The
1069 <option>-e</option> option causes the graph to be drawn in
1070 portrait mode, with float specifying the width in inches,
1071 millimetres or points (the default). The resulting
1072 PostScript file conforms to the Encapsulated PostScript
1073 (EPS) convention, and it can be included in a LaTeX document
1074 using Rokicki's dvi-to-PostScript converter
1075 <command>dvips</command>.</para>
1080 <term><option>-g</option></term>
1082 <para>Create output suitable for the <command>gs</command>
1083 PostScript previewer (or similar). In this case the graph is
1084 printed in portrait mode without scaling. The output is
1085 unsuitable for a laser printer.</para>
1090 <term><option>-l</option></term>
1092 <para>Normally a profile is limited to 20 bands with
1093 additional identifiers being grouped into an
1094 <literal>OTHER</literal> band. The <option>-l</option> flag
1095 removes this 20 band and limit, producing as many bands as
1096 necessary. No key is produced as it won't fit!. It is useful
1097 for creation time profiles with many bands.</para>
1102 <term><option>-m<int></option></term>
1104 <para>Normally a profile is limited to 20 bands with
1105 additional identifiers being grouped into an
1106 <literal>OTHER</literal> band. The <option>-m</option> flag
1107 specifies an alternative band limit (the maximum is
1110 <para><option>-m0</option> requests the band limit to be
1111 removed. As many bands as necessary are produced. However no
1112 key is produced as it won't fit! It is useful for displaying
1113 creation time profiles with many bands.</para>
1118 <term><option>-p</option></term>
1120 <para>Use previous parameters. By default, the PostScript
1121 graph is automatically scaled both horizontally and
1122 vertically so that it fills the page. However, when
1123 preparing a series of graphs for use in a presentation, it
1124 is often useful to draw a new graph using the same scale,
1125 shading and ordering as a previous one. The
1126 <option>-p</option> flag causes the graph to be drawn using
1127 the parameters determined by a previous run of
1128 <command>hp2ps</command> on <filename>file</filename>. These
1129 are extracted from <filename>file@.aux</filename>.</para>
1134 <term><option>-s</option></term>
1136 <para>Use a small box for the title.</para>
1141 <term><option>-t<float></option></term>
1143 <para>Normally trace elements which sum to a total of less
1144 than 1% of the profile are removed from the
1145 profile. The <option>-t</option> option allows this
1146 percentage to be modified (maximum 5%).</para>
1148 <para><option>-t0</option> requests no trace elements to be
1149 removed from the profile, ensuring that all the data will be
1155 <term><option>-c</option></term>
1157 <para>Generate colour output.</para>
1162 <term><option>-y</option></term>
1164 <para>Ignore marks.</para>
1169 <term><option>-?</option></term>
1171 <para>Print out usage information.</para>
1177 <sect2 id="manipulating-hp">
1178 <title>Manipulating the hp file</title>
1180 <para>(Notes kindly offered by Jan-Willhem Maessen.)</para>
1183 The <filename>FOO.hp</filename> file produced when you ask for the
1184 heap profile of a program <filename>FOO</filename> is a text file with a particularly
1185 simple structure. Here's a representative example, with much of the
1186 actual data omitted:
1189 DATE "Thu Dec 26 18:17 2002"
1190 SAMPLE_UNIT "seconds"
1201 BEGIN_SAMPLE 11695.47
1204 The first four lines (<literal>JOB</literal>, <literal>DATE</literal>, <literal>SAMPLE_UNIT</literal>, <literal>VALUE_UNIT</literal>) form a
1205 header. Each block of lines starting with <literal>BEGIN_SAMPLE</literal> and ending
1206 with <literal>END_SAMPLE</literal> forms a single sample (you can think of this as a
1207 vertical slice of your heap profile). The hp2ps utility should accept
1208 any input with a properly-formatted header followed by a series of
1214 <title>Zooming in on regions of your profile</title>
1217 You can look at particular regions of your profile simply by loading a
1218 copy of the <filename>.hp</filename> file into a text editor and deleting the unwanted
1219 samples. The resulting <filename>.hp</filename> file can be run through <command>hp2ps</command> and viewed
1225 <title>Viewing the heap profile of a running program</title>
1228 The <filename>.hp</filename> file is generated incrementally as your
1229 program runs. In principle, running <command>hp2ps</command> on the incomplete file
1230 should produce a snapshot of your program's heap usage. However, the
1231 last sample in the file may be incomplete, causing <command>hp2ps</command> to fail. If
1232 you are using a machine with UNIX utilities installed, it's not too
1233 hard to work around this problem (though the resulting command line
1234 looks rather Byzantine):
1236 head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
1240 The command <command>fgrep -n END_SAMPLE FOO.hp</command> finds the
1241 end of every complete sample in <filename>FOO.hp</filename>, and labels each sample with
1242 its ending line number. We then select the line number of the last
1243 complete sample using <command>tail</command> and <command>cut</command>. This is used as a
1244 parameter to <command>head</command>; the result is as if we deleted the final
1245 incomplete sample from <filename>FOO.hp</filename>. This results in a properly-formatted
1246 .hp file which we feed directly to <command>hp2ps</command>.
1250 <title>Viewing a heap profile in real time</title>
1253 The <command>gv</command> and <command>ghostview</command> programs
1254 have a "watch file" option can be used to view an up-to-date heap
1255 profile of your program as it runs. Simply generate an incremental
1256 heap profile as described in the previous section. Run <command>gv</command> on your
1259 gv -watch -seascape FOO.ps
1261 If you forget the <literal>-watch</literal> flag you can still select
1262 "Watch file" from the "State" menu. Now each time you generate a new
1263 profile <filename>FOO.ps</filename> the view will update automatically.
1267 This can all be encapsulated in a little script:
1270 head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
1272 gv -watch -seascape FOO.ps &
1274 sleep 10 # We generate a new profile every 10 seconds.
1275 head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
1279 Occasionally <command>gv</command> will choke as it tries to read an incomplete copy of
1280 <filename>FOO.ps</filename> (because <command>hp2ps</command> is still running as an update
1281 occurs). A slightly more complicated script works around this
1282 problem, by using the fact that sending a SIGHUP to gv will cause it
1283 to re-read its input file:
1286 head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
1292 head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
1302 <title>Observing Code Coverage</title>
1303 <indexterm><primary>code coverage</primary></indexterm>
1304 <indexterm><primary>Haskell Program Coverage</primary></indexterm>
1305 <indexterm><primary>hpc</primary></indexterm>
1308 Code coverage tools allow a programmer to determine what parts of
1309 their code have been actually executed, and which parts have
1310 never actually been invoked. GHC has an option for generating
1311 instrumented code that records code coverage as part of the
1312 <ulink url="http://www.haskell.org/hpc">Haskell Program Coverage
1313 </ulink>(HPC) toolkit, which is included with GHC. HPC tools can
1314 be used to render the generated code coverage information into
1315 human understandable format. </para>
1318 Correctly instrumented code provides coverage information of two
1319 kinds: source coverage and boolean-control coverage. Source
1320 coverage is the extent to which every part of the program was
1321 used, measured at three different levels: declarations (both
1322 top-level and local), alternatives (among several equations or
1323 case branches) and expressions (at every level). Boolean
1324 coverage is the extent to which each of the values True and
1325 False is obtained in every syntactic boolean context (ie. guard,
1326 condition, qualifier). </para>
1329 HPC displays both kinds of information in two primary ways:
1330 textual reports with summary statistics (hpc report) and sources
1331 with color mark-up (hpc markup). For boolean coverage, there
1332 are four possible outcomes for each guard, condition or
1333 qualifier: both True and False values occur; only True; only
1334 False; never evaluated. In hpc-markup output, highlighting with
1335 a yellow background indicates a part of the program that was
1336 never evaluated; a green background indicates an always-True
1337 expression and a red background indicates an always-False one.
1340 <sect2><title>A small example: Reciprocation</title>
1343 For an example we have a program, called Recip.hs, which computes exact decimal
1344 representations of reciprocals, with recurring parts indicated in
1348 reciprocal :: Int -> (String, Int)
1349 reciprocal n | n > 1 = ('0' : '.' : digits, recur)
1351 "attempting to compute reciprocal of number <= 1"
1353 (digits, recur) = divide n 1 []
1354 divide :: Int -> Int -> [Int] -> (String, Int)
1355 divide n c cs | c `elem` cs = ([], position c cs)
1356 | r == 0 = (show q, 0)
1357 | r /= 0 = (show q ++ digits, recur)
1359 (q, r) = (c*10) `quotRem` n
1360 (digits, recur) = divide n r (c:cs)
1362 position :: Int -> [Int] -> Int
1363 position n (x:xs) | n==x = 1
1364 | otherwise = 1 + position n xs
1366 showRecip :: Int -> String
1368 "1/" ++ show n ++ " = " ++
1369 if r==0 then d else take p d ++ "(" ++ drop p d ++ ")"
1372 (d, r) = reciprocal n
1376 putStrLn (showRecip number)
1380 <para>The HPC instrumentation is enabled using the -fhpc flag.
1384 $ ghc -fhpc Recip.hs --make
1386 <para>HPC index (.mix) files are placed placed in .hpc subdirectory. These can be considered like
1387 the .hi files for HPC.
1394 <para>We can generate a textual summary of coverage:</para>
1397 80% expressions used (81/101)
1398 12% boolean coverage (1/8)
1399 14% guards (1/7), 3 always True,
1402 0% 'if' conditions (0/1), 1 always False
1403 100% qualifiers (0/0)
1404 55% alternatives used (5/9)
1405 100% local declarations used (9/9)
1406 100% top-level declarations used (5/5)
1408 <para>We can also generate a marked-up version of the source.</para>
1411 writing Recip.hs.html
1414 This generates one file per Haskell module, and 4 index files,
1415 hpc_index.html, hpc_index_alt.html, hpc_index_exp.html,
1420 <sect2><title>Options for instrumenting code for coverage</title>
1422 Turning on code coverage is easy, use the -fhpc flag.
1423 Instrumented and non-instrumented can be freely mixed.
1424 When compiling the Main module GHC automatically detects when there
1425 is an hpc compiled file, and adds the correct initialization code.
1430 <sect2><title>The hpc toolkit</title>
1433 The hpc toolkit uses a cvs/svn/darcs-like interface, where a
1434 single binary contains many function units.</para>
1437 Usage: hpc COMMAND ...
1440 help Display help for hpc or a single command
1442 report Output textual report about program coverage
1443 markup Markup Haskell source with program coverage
1444 Processing Coverage files:
1445 sum Sum multiple .tix files in a single .tix file
1446 combine Combine two .tix files in a single .tix file
1447 map Map a function over a single .tix file
1449 overlay Generate a .tix file from an overlay file
1450 draft Generate draft overlay that provides 100% coverage
1452 show Show .tix file in readable, verbose format
1453 version Display version for hpc
1456 <para>In general, these options act on .tix file after an
1457 instrumented binary has generated it, which hpc acting as a
1458 conduit between the raw .tix file, and the more detailed reports
1463 The hpc tool assumes you are in the top-level directory of
1464 the location where you built your application, and the .tix
1465 file is in the same top-level directory. You can use the
1466 flag --srcdir to use hpc for any other directory, and use
1467 --srcdir multiple times to analyse programs compiled from
1468 difference locations, as is typical for packages.
1472 We now explain in more details the major modes of hpc.
1475 <sect3><title>hpc report</title>
1476 <para>hpc report gives a textual report of coverage. By default,
1477 all modules and packages are considered in generating report,
1478 unless include or exclude are used. The report is a summary
1479 unless the --per-module flag is used. The --xml-output option
1480 allows for tools to use hpc to glean coverage.
1484 Usage: hpc report [OPTION] .. <TIX_FILE> [<MODULE> [<MODULE> ..]]
1488 --per-module show module level detail
1489 --decl-list show unused decls
1490 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1491 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1492 --srcdir=DIR path to source directory of .hs files
1493 multi-use of srcdir possible
1494 --hpcdir=DIR sub-directory that contains .mix files
1495 default .hpc [rarely used]
1496 --xml-output show output in XML
1499 <sect3><title>hpc markup</title>
1500 <para>hpc markup marks up source files into colored html.
1504 Usage: hpc markup [OPTION] .. <TIX_FILE> [<MODULE> [<MODULE> ..]]
1508 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1509 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1510 --srcdir=DIR path to source directory of .hs files
1511 multi-use of srcdir possible
1512 --hpcdir=DIR sub-directory that contains .mix files
1513 default .hpc [rarely used]
1514 --fun-entry-count show top-level function entry counts
1515 --highlight-covered highlight covered code, rather that code gaps
1516 --destdir=DIR path to write output to
1520 <sect3><title>hpc sum</title>
1521 <para>hpc sum adds together any number of .tix files into a single
1522 .tix file. hpc sum does not change the original .tix file; it generates a new .tix file.
1526 Usage: hpc sum [OPTION] .. <TIX_FILE> [<TIX_FILE> [<TIX_FILE> ..]]
1527 Sum multiple .tix files in a single .tix file
1531 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1532 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1533 --output=FILE output FILE
1534 --union use the union of the module namespace (default is intersection)
1537 <sect3><title>hpc combine</title>
1538 <para>hpc combine is the swiss army knife of hpc. It can be
1539 used to take the difference between .tix files, to subtract one
1540 .tix file from another, or to add two .tix files. hpc combine does not
1541 change the original .tix file; it generates a new .tix file.
1545 Usage: hpc combine [OPTION] .. <TIX_FILE> <TIX_FILE>
1546 Combine two .tix files in a single .tix file
1550 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1551 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1552 --output=FILE output FILE
1553 --function=FUNCTION combine .tix files with join function, default = ADD
1554 FUNCTION = ADD | DIFF | SUB
1555 --union use the union of the module namespace (default is intersection)
1558 <sect3><title>hpc map</title>
1559 <para>hpc map inverts or zeros a .tix file. hpc map does not
1560 change the original .tix file; it generates a new .tix file.
1564 Usage: hpc map [OPTION] .. <TIX_FILE>
1565 Map a function over a single .tix file
1569 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1570 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1571 --output=FILE output FILE
1572 --function=FUNCTION apply function to .tix files, default = ID
1573 FUNCTION = ID | INV | ZERO
1574 --union use the union of the module namespace (default is intersection)
1577 <sect3><title>hpc overlay and hpc draft</title>
1579 Overlays are an experimental feature of HPC, a textual description
1580 of coverage. hpc draft is used to generate a draft overlay from a .tix file,
1581 and hpc overlay generates a .tix files from an overlay.
1585 Usage: hpc overlay [OPTION] .. <OVERLAY_FILE> [<OVERLAY_FILE> [...]]
1589 --srcdir=DIR path to source directory of .hs files
1590 multi-use of srcdir possible
1591 --hpcdir=DIR sub-directory that contains .mix files
1592 default .hpc [rarely used]
1593 --output=FILE output FILE
1595 Usage: hpc draft [OPTION] .. <TIX_FILE>
1599 --exclude=[PACKAGE:][MODULE] exclude MODULE and/or PACKAGE
1600 --include=[PACKAGE:][MODULE] include MODULE and/or PACKAGE
1601 --srcdir=DIR path to source directory of .hs files
1602 multi-use of srcdir possible
1603 --hpcdir=DIR sub-directory that contains .mix files
1604 default .hpc [rarely used]
1605 --output=FILE output FILE
1609 <sect2><title>Caveats and Shortcomings of Haskell Program Coverage</title>
1611 HPC does not attempt to lock the .tix file, so multiple concurrently running
1612 binaries in the same directory will exhibit a race condition. There is no way
1613 to change the name of the .tix file generated, apart from renaming the binary.
1614 HPC does not work with GHCi.
1619 <sect1 id="ticky-ticky">
1620 <title>Using “ticky-ticky” profiling (for implementors)</title>
1621 <indexterm><primary>ticky-ticky profiling</primary></indexterm>
1623 <para>(ToDo: document properly.)</para>
1625 <para>It is possible to compile Haskell programs so that
1626 they will count lots and lots of interesting things, e.g., number
1627 of updates, number of data constructors entered, etc., etc. We
1628 call this “ticky-ticky”
1629 profiling,<indexterm><primary>ticky-ticky
1630 profiling</primary></indexterm> <indexterm><primary>profiling,
1631 ticky-ticky</primary></indexterm> because that's the sound a CPU
1632 makes when it is running up all those counters
1633 (<emphasis>slowly</emphasis>).</para>
1635 <para>Ticky-ticky profiling is mainly intended for implementors;
1636 it is quite separate from the main “cost-centre”
1637 profiling system, intended for all users everywhere.</para>
1640 You don't need to build GHC, the libraries, or the RTS a special
1641 way in order to use ticky-ticky profiling. You can decide on a
1642 module-by-module basis which parts of a program have the
1643 counters compiled in, using the
1644 compile-time <option>-ticky</option> option. Those modules that
1645 were not compiled with <option>-ticky</option> won't contribute
1646 to the ticky-ticky profiling results, and that will normally
1647 include all the pre-compiled packages that your program links
1652 To get your compiled program to spit out the ticky-ticky
1658 Link the program with <option>-debug</option>
1659 (<option>-ticky</option> is a synonym
1660 for <option>-debug</option> at link-time). This links in
1661 the debug version of the RTS, which includes the code for
1662 aggregating and reporting the results of ticky-ticky
1668 Run the program with the <option>-r</option> RTS
1669 option<indexterm><primary>-r RTS option</primary></indexterm>.
1670 See <xref linkend="runtime-control"/>.
1677 Here is a sample ticky-ticky statistics file, generated by
1679 <command>foo +RTS -rfoo.ticky</command>.
1683 foo +RTS -rfoo.ticky
1685 ALLOCATIONS: 3964631 (11330900 words total: 3999476 admin, 6098829 goods, 1232595 slop)
1686 total words: 2 3 4 5 6+
1687 69647 ( 1.8%) function values 50.0 50.0 0.0 0.0 0.0
1688 2382937 ( 60.1%) thunks 0.0 83.9 16.1 0.0 0.0
1689 1477218 ( 37.3%) data values 66.8 33.2 0.0 0.0 0.0
1690 0 ( 0.0%) big tuples
1691 2 ( 0.0%) black holes 0.0 100.0 0.0 0.0 0.0
1692 0 ( 0.0%) prim things
1693 34825 ( 0.9%) partial applications 0.0 0.0 0.0 100.0 0.0
1694 2 ( 0.0%) thread state objects 0.0 0.0 0.0 0.0 100.0
1696 Total storage-manager allocations: 3647137 (11882004 words)
1697 [551104 words lost to speculative heap-checks]
1701 ENTERS: 9400092 of which 2005772 (21.3%) direct to the entry code
1702 [the rest indirected via Node's info ptr]
1703 1860318 ( 19.8%) thunks
1704 3733184 ( 39.7%) data values
1705 3149544 ( 33.5%) function values
1706 [of which 1999880 (63.5%) bypassed arg-satisfaction chk]
1707 348140 ( 3.7%) partial applications
1708 308906 ( 3.3%) normal indirections
1709 0 ( 0.0%) permanent indirections
1712 2137257 ( 36.4%) from entering a new constructor
1713 [the rest from entering an existing constructor]
1714 2349219 ( 40.0%) vectored [the rest unvectored]
1716 RET_NEW: 2137257: 32.5% 46.2% 21.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
1717 RET_OLD: 3733184: 2.8% 67.9% 29.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
1718 RET_UNBOXED_TUP: 2: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
1720 RET_VEC_RETURN : 2349219: 0.0% 0.0%100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
1722 UPDATE FRAMES: 2241725 (0 omitted from thunks)
1726 0 ( 0.0%) data values
1727 34827 ( 1.6%) partial applications
1728 [2 in place, 34825 allocated new space]
1729 2206898 ( 98.4%) updates to existing heap objects (46 by squeezing)
1730 UPD_CON_IN_NEW: 0: 0 0 0 0 0 0 0 0 0
1731 UPD_PAP_IN_NEW: 34825: 0 0 0 34825 0 0 0 0 0
1733 NEW GEN UPDATES: 2274700 ( 99.9%)
1735 OLD GEN UPDATES: 1852 ( 0.1%)
1737 Total bytes copied during GC: 190096
1739 **************************************************
1740 3647137 ALLOC_HEAP_ctr
1741 11882004 ALLOC_HEAP_tot
1746 34831 ALLOC_FUN_hst_0
1747 34816 ALLOC_FUN_hst_1
1751 2382937 ALLOC_UP_THK_ctr
1754 0 E!NT_PERM_IND_ctr requires +RTS -Z
1755 [... lots more info omitted ...]
1756 0 GC_SEL_ABANDONED_ctr
1759 0 GC_FAILED_PROMOTION_ctr
1760 47524 GC_WORDS_COPIED_ctr
1763 <para>The formatting of the information above the row of asterisks
1764 is subject to change, but hopefully provides a useful
1765 human-readable summary. Below the asterisks <emphasis>all
1766 counters</emphasis> maintained by the ticky-ticky system are
1767 dumped, in a format intended to be machine-readable: zero or more
1768 spaces, an integer, a space, the counter name, and a newline.</para>
1770 <para>In fact, not <emphasis>all</emphasis> counters are
1771 necessarily dumped; compile- or run-time flags can render certain
1772 counters invalid. In this case, either the counter will simply
1773 not appear, or it will appear with a modified counter name,
1774 possibly along with an explanation for the omission (notice
1775 <literal>ENT_PERM_IND_ctr</literal> appears
1776 with an inserted <literal>!</literal> above). Software analysing
1777 this output should always check that it has the counters it
1778 expects. Also, beware: some of the counters can have
1779 <emphasis>large</emphasis> values!</para>
1786 ;;; Local Variables: ***
1787 ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***