X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fusers_guide%2Fprofiling.sgml;h=662164545bf82bfae7f3b80d4b0f8f3ac3960cdd;hb=73641e01ee9dfbe83f8c6225c1f6ae2e7d621b63;hp=a0bd4f68b51f46de1e97bc9fb0f1947997722ff2;hpb=c7f8f1e62b555462f98c3f813440559116033a99;p=ghc-hetmet.git
diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml
index a0bd4f6..6621645 100644
--- a/ghc/docs/users_guide/profiling.sgml
+++ b/ghc/docs/users_guide/profiling.sgml
@@ -1,909 +1,1107 @@
-
-Profiling
-
-
-
-profiling, with cost-centres
-cost-centre profiling
-
-
-
-Glasgow Haskell comes with a time and space profiling system. Its
-purpose is to help you improve your understanding of your program's
-execution behaviour, so you can improve it.
-
-
-
-Any comments, suggestions and/or improvements you have are welcome.
-Recommended ``profiling tricks'' would be especially cool!
-
-
-
-How to profile a Haskell program
-
-
-
-The GHC approach to profiling is very simple: annotate the expressions
-you consider ``interesting'' with cost centre labels (strings);
-so, for example, you might have:
-
-
-
-
-
-f x y
- = let
- output1 = _scc_ "Pass1" ( pass1 x )
- output2 = _scc_ "Pass2" ( pass2 output1 y )
- output3 = _scc_ "Pass3" ( pass3 (output2 `zip` [1 .. ]) )
- in concat output3
-
-
-
-
-
-The costs of the evaluating the expressions bound to output1,
-output2 and output3 will be attributed to the ``cost
-centres'' Pass1, Pass2 and Pass3, respectively.
-
-
-
-The costs of evaluating other expressions, e.g., concat output4,
-will be inherited by the scope which referenced the function f.
-
-
-
-You can put in cost-centres via _scc_ constructs by hand, as in the
-example above. Perfectly cool. That's probably what you
-would do if your program divided into obvious ``passes'' or
-``phases'', or whatever.
-
-
-
-If your program is large or you have no clue what might be gobbling
-all the time, you can get GHC to mark all functions with _scc_
-constructs, automagically. Add an -auto compilation flag to the
-usual -prof option.
-
-
-
-Once you start homing in on the Guilty Suspects, you may well switch
-from automagically-inserted cost-centres to a few well-chosen ones of
-your own.
-
-
-
-To use profiling, you must compile and run with special
-options. (We usually forget the ``run'' magic!—Do as we say, not as
-we do…) Details follow.
-
-
-
-If you're serious about this profiling game, you should probably read
-one or more of the Sansom/Peyton Jones papers about the GHC profiling
-system. Just visit the Glasgow FP group web page…
-
-
-
-
-
-Compiling programs for profiling
-
-
-
-profiling options
-options, for profiling
-
-
-
-To make use of the cost centre profiling system all modules must
-be compiled and linked with the -prof option.-prof option
-Any _scc_ constructs you've put in your source will spring to life.
-
-
-
-Without a -prof option, your _scc_s are ignored; so you can
-compiled _scc_-laden code without changing it.
-
-
-
-There are a few other profiling-related compilation options. Use them
-in addition to-prof. These do not have to be used
-consistently for all modules in a program.
-
-
-
-
-
-
--auto:
-
-
--auto option
-cost centres, automatically inserting
-GHC will automatically add _scc_ constructs for
-all top-level, exported functions.
-
-
-
-
--auto-all:
-
-
--auto-all option
-All top-level functions, exported or not, will be automatically
-_scc_'d.
-
-
-
-
--caf-all:
-
-
--caf-all option
-The costs of all CAFs in a module are usually attributed to one
-``big'' CAF cost-centre. With this option, all CAFs get their own cost-centre.
-An ``if all else fails'' option…
-
-
-
-
--ignore-scc:
-
-
--ignore-scc option
-Ignore any _scc_ constructs,
-so a module which already has _scc_s can be
-compiled for profiling with the annotations ignored.
-
-
-
-
--G<group>:
-
-
--G<group> option
-Specifies the <group> to be attached to all the cost-centres
-declared in the module. If no group is specified it defaults to the
-module name.
-
-
-
-
-
-
-
-In addition to the -prof option your system might be setup to enable
-you to compile and link with the -prof-details-prof-details
-option option instead. This enables additional detailed counts
-to be reported with the -P RTS option.
-
-
-
-
-
-How to control your profiled program at runtime
-
-
-
-profiling RTS options
-RTS options, for profiling
-
-
-
-It isn't enough to compile your program for profiling with -prof!
-
-
-
-When you run your profiled program, you must tell the runtime
-system (RTS) what you want to profile (e.g., time and/or space), and
-how you wish the collected data to be reported. You also may wish to
-set the sampling interval used in time profiling.
-
-
-
-Executive summary: ./a.out +RTS -pT produces a time profile in
-a.out.prof; ./a.out +RTS -hC produces space-profiling
-info which can be mangled by hp2ps and viewed with ghostview
-(or equivalent).
-
-
-
-Profiling runtime flags are passed to your program between the usual
-+RTS and -RTS options.
-
-
-
-
-
-
--p<sort> or -P<sort>:
-
-
--p<sort> RTS option (profiling)
--P<sort> RTS option (profiling)
-time profile
-serial time profile
-The -p? option produces a standard time profile report.
-It is written into the file <program>@.prof.
-
-
-
-The -P? option produces a more detailed report containing the
-actual time and allocation data as well. (Not used much.)
-
-
-
-The <sort> indicates how the cost centres are to be sorted in the
-report. Valid <sort> options are:
-
-
-
-T:
-
-
-by time, largest first (the default);
-
-
-
-
-A:
-
-
-by bytes allocated, largest first;
-
-
-
-
-C:
-
-
-alphabetically by group, module and cost centre.
-
-
-
-
-
-
-
-
--i<secs>:
-
-
--i<secs> RTS option
-(profiling) Set the profiling (sampling) interval to <secs>
-seconds (the default is 1 second). Fractions are allowed: for example
--i0.2 will get 5 samples per second.
-
-
-
-
--h<break-down>:
-
-
--h<break-down> RTS option (profiling)
-heap profile
-
-
-
-Produce a detailed space profile of the heap occupied by live
-closures. The profile is written to the file <program>@.hp from
-which a PostScript graph can be produced using hp2ps (see
-).
-
-
-
-The heap space profile may be broken down by different criteria:
-
-
-
--hC:
-
-
-cost centre which produced the closure (the default).
-
-
-
-
--hM:
-
-
-cost centre module which produced the closure.
-
-
-
-
--hG:
-
-
-cost centre group which produced the closure.
-
-
-
-
--hD:
-
-
-closure description—a string describing the closure.
-
-
-
-
--hY:
-
-
-closure type—a string describing the closure's type.
-
-
-
-
-By default all live closures in the heap are profiled, but particular
-closures of interest can be selected (see below).
-
-
-
-
-
-
-
-Heap (space) profiling uses hash tables. If these tables
-should fill the run will abort. The
--z<tbl><size>-z<tbl><size> RTS option (profiling) option is used to
-increase the size of the relevant hash table (C, M,
-G, D or Y, defined as for <break-down> above). The
-actual size used is the next largest power of 2.
-
-
-
-The heap profile can be restricted to particular closures of interest.
-The closures of interest can selected by the attached cost centre
-(module:label, module and group), closure category (description, type,
-and kind) using the following options:
-
-
-
-
-
-
--c{<mod>:<lab>,<mod>:<lab>...}:
-
-
--c{<lab> RTS option (profiling)}
-Selects individual cost centre(s).
-
-
-
-
--m{<mod>,<mod>...}:
-
-
--m{<mod> RTS option (profiling)}
-Selects all cost centres from the module(s) specified.
-
-
-
-
--g{<grp>,<grp>...}:
-
-
--g{<grp> RTS option (profiling)}
-Selects all cost centres from the groups(s) specified.
-
-
-
-
--d{<des>,<des>...}:
-
-
--d{<des> RTS option (profiling)}
-Selects closures which have one of the specified descriptions.
-
-
-
-
--y{<typ>,<typ>...}:
-
-
--y{<typ> RTS option (profiling)}
-Selects closures which have one of the specified type descriptions.
-
-
-
-
--k{<knd>,<knd>...}:
-
-
--k{<knd> RTS option (profiling)}
-Selects closures which are of one of the specified closure kinds.
-Valid closure kinds are CON (constructor), FN (manifest
-function), PAP (partial application), BH (black hole) and
-THK (thunk).
-
-
-
-
-
-
-
-The space occupied by a closure will be reported in the heap profile
-if the closure satisfies the following logical expression:
-
-
-
-([-c] or [-m] or [-g]) and ([-d] or [-y] or [-k])
-
-
-
-where a particular option is true if the closure (or its attached cost
-centre) is selected by the option (or the option is not specified).
-
-
-
-
-
-What's in a profiling report?
-
-
-
-profiling report, meaning thereof
-
-
-
-When you run your profiled program with the -p RTS option -p
-RTS option, you get the following information about your ``cost
-centres'':
-
-
-
-
-
-
-COST CENTRE:
-
-
-The cost-centre's name.
-
-
-
-
-MODULE:
-
-
-The module associated with the cost-centre;
-important mostly if you have identically-named cost-centres in
-different modules.
-
-
-
-
-scc:
-
-
-How many times this cost-centre was entered; think
-of it as ``I got to the _scc_ construct this many times…''
-
-
-
-
-%time:
-
-
-What part of the time was spent in this cost-centre (see also ``ticks,''
-below).
-
-
-
-
-%alloc:
-
-
-What part of the memory allocation was done in this cost-centre
-(see also ``bytes,'' below).
-
-
-
-
-inner:
-
-
-How many times this cost-centre ``passed control'' to an inner
-cost-centre; for example, scc=4 plus subscc=8 means
-``This _scc_ was entered four times, but went out to
-other _scc_s eight times.''
-
-
-
-
-cafs:
-
-
-CAF, profiling
-How many CAFs this cost centre evaluated.
-
-
-
-
-dicts:
-
-
-Dictionaries, profiling
-How many dictionaries this cost centre evaluated.
-
-
-
-
-
-
-
-In addition you can use the -P RTS option to get the following additional information:
-
-
-
-ticks:
-
-
-The raw number of time ``ticks'' which were
-attributed to this cost-centre; from this, we get the %time
-figure mentioned above.
-
-
-
-
-bytes:
-
-
-Number of bytes allocated in the heap while in
-this cost-centre; again, this is the raw number from which we
-get the %alloc figure mentioned above.
-
-
-
-
-
-
-
-Finally if you built your program with -prof-details
- the -P RTS option will also
-produce the following information:
-
-
-
-closures:
-
-
-closures, profiling
-How many heap objects were allocated; these objects may be of varying
-size. If you divide the number of bytes (mentioned below) by this
-number of ``closures'', then you will get the average object size.
-(Not too interesting, but still…)
-
-
-
-
-thunks:
-
-
-thunks, profiling
-How many times we entered (evaluated) a thunk—an unevaluated
-object in the heap—while we were in this cost-centre.
-
-
-
-
-funcs:
-
-
-functions, profiling
-How many times we entered (evaluated) a function while we we in this
-cost-centre. (In Haskell, functions are first-class values and may be
-passed as arguments, returned as results, evaluated, and generally
-manipulated just like data values)
-
-
-
-
-PAPs:
-
-
-partial applications, profiling
-How many times we entered (evaluated) a partial application (PAP), i.e.,
-a function applied to fewer arguments than it needs. For example, Int
-addition applied to one argument would be a PAP. A PAP is really
-just a particular form for a function.
-
-
-
-
-
-
-
-
-
-Producing graphical heap profiles
-
-
-
-heap profiles, producing
-
-
-
-Utility programs which produce graphical profiles.
-
-
-
-hp2ps--heap profile to PostScript
-
-
-
-hp2ps (utility)
-heap profiles
-PostScript, from heap profiles
-
-
-
-Usage:
-
-
-
-
-
-hp2ps [flags] [<file>[.stat]]
-
-
-
-
-
-The program hp2pshp2ps program converts a heap profile
-as produced by the -h<break-down>-h<break-down> RTS
-option runtime option into a PostScript graph of the heap
-profile. By convention, the file to be processed by hp2ps has a
-.hp extension. The PostScript output is written to <file>@.ps. If
-<file> is omitted entirely, then the program behaves as a filter.
-
-
-
-hp2ps is distributed in ghc/utils/hp2ps in a GHC source
-distribution. It was originally developed by Dave Wakeling as part of
-the HBC/LML heap profiler.
-
-
-
-The flags are:
-
-
-
--d
-
-
-In order to make graphs more readable, hp2ps sorts the shaded
-bands for each identifier. The default sort ordering is for the bands
-with the largest area to be stacked on top of the smaller ones. The
--d option causes rougher bands (those representing series of
-values with the largest standard deviations) to be stacked on top of
-smoother ones.
-
-
-
-
--b
-
-
-Normally, hp2ps puts the title of the graph in a small box at the
-top of the page. However, if the JOB string is too long to fit in a
-small box (more than 35 characters), then
-hp2ps will choose to use a big box instead. The -b
-option forces hp2ps to use a big box.
-
-
-
-
--e<float>[in|mm|pt]
-
-
-Generate encapsulated PostScript suitable for inclusion in LaTeX
-documents. Usually, the PostScript graph is drawn in landscape mode
-in an area 9 inches wide by 6 inches high, and hp2ps arranges
-for this area to be approximately centred on a sheet of a4 paper.
-This format is convenient of studying the graph in detail, but it is
-unsuitable for inclusion in LaTeX documents. The -e option
-causes the graph to be drawn in portrait mode, with float specifying
-the width in inches, millimetres or points (the default). The
-resulting PostScript file conforms to the Encapsulated PostScript
-(EPS) convention, and it can be included in a LaTeX document using
-Rokicki's dvi-to-PostScript converter dvips.
-
-
-
-
--g
-
-
-Create output suitable for the gs PostScript previewer (or
-similar). In this case the graph is printed in portrait mode without
-scaling. The output is unsuitable for a laser printer.
-
-
-
-
--l
-
-
-Normally a profile is limited to 20 bands with additional identifiers
-being grouped into an OTHER band. The -l flag removes this
-20 band and limit, producing as many bands as necessary. No key is
-produced as it won't fit!. It is useful for creation time profiles
-with many bands.
-
-
-
-
--m<int>
-
-
-Normally a profile is limited to 20 bands with additional identifiers
-being grouped into an OTHER band. The -m flag specifies an
-alternative band limit (the maximum is 20).
-
-
-
--m0 requests the band limit to be removed. As many bands as
-necessary are produced. However no key is produced as it won't fit! It
-is useful for displaying creation time profiles with many bands.
-
-
-
-
--p
-
-
-Use previous parameters. By default, the PostScript graph is
-automatically scaled both horizontally and vertically so that it fills
-the page. However, when preparing a series of graphs for use in a
-presentation, it is often useful to draw a new graph using the same
-scale, shading and ordering as a previous one. The -p flag causes
-the graph to be drawn using the parameters determined by a previous
-run of hp2ps on file. These are extracted from
-file@.aux.
-
-
-
-
--s
-
-
-Use a small box for the title.
-
-
-
-
--t<float>
-
-
-Normally trace elements which sum to a total of less than 1% of the
-profile are removed from the profile. The -t option allows this
-percentage to be modified (maximum 5%).
-
-
-
--t0 requests no trace elements to be removed from the profile,
-ensuring that all the data will be displayed.
-
-
-
-
--?
-
-
-Print out usage information.
-
-
-
-
-
-
-
-
-
-stat2resid—residency info from GC stats
-
-
-
-stat2resid (utility)
-GC stats—residency info
-residency, from GC stats
-
-
-
-Usage:
-
-
-
-
-
-stat2resid [<file>[.stat] [<outfile>]]
-
-
-
-
-
-The program stat2residstat2resid converts a detailed
-garbage collection statistics file produced by the
--S-S RTS option runtime option into a PostScript heap
-residency graph. The garbage collection statistics file can be
-produced without compiling your program for profiling.
-
-
-
-By convention, the file to be processed by stat2resid has a
-.stat extension. If the <outfile> is not specified the
-PostScript will be written to <file>@.resid.ps. If
-<file> is omitted entirely, then the program behaves as a filter.
-
-
-
-The plot can not be produced from the statistics file for a
-generational collector, though a suitable stats file can be produced
-using the -F2s-F2s RTS option runtime option when the
-program has been compiled for generational garbage collection (the
-default).
-
-
-
-stat2resid is distributed in ghc/utils/stat2resid in a GHC source
-distribution.
-
-
-
-
-
-
-
-Using ``ticky-ticky'' profiling (for implementors)
-
-
-
-ticky-ticky profiling (implementors)
-
-
-
-(ToDo: document properly.)
-
-
-
-It is possible to compile Glasgow Haskell programs so that they will
-count lots and lots of interesting things, e.g., number of updates,
-number of data constructors entered, etc., etc. We call this
-``ticky-ticky'' profiling,ticky-ticky profiling
-profiling, ticky-ticky because that's the sound a Sun4 makes
-when it is running up all those counters (slowly).
-
-
-
-Ticky-ticky profiling is mainly intended for implementors; it is quite
-separate from the main ``cost-centre'' profiling system, intended for
-all users everywhere.
-
-
-
-To be able to use ticky-ticky profiling, you will need to have built
-appropriate libraries and things when you made the system. See
-``Customising what libraries to build,'' in the installation guide.
-
-
-
-To get your compiled program to spit out the ticky-ticky numbers, use
-a -r RTS option-r RTS option. See .
-
-
-
-Compiling your program with the -ticky switch yields an executable
-that performs these counts. Here is a sample ticky-ticky statistics
-file, generated by the invocation foo +RTS -rfoo.ticky.
-
-
-
-
-
+
+ Profiling
+ profiling
+
+ cost-centre profiling
+
+ Glasgow Haskell comes with a time and space profiling
+ system. Its purpose is to help you improve your understanding of
+ your program's execution behaviour, so you can improve it.
+
+ Any comments, suggestions and/or improvements you have are
+ welcome. Recommended “profiling tricks” would be
+ especially cool!
+
+ Profiling a program is a three-step process:
+
+
+
+ Re-compile your program for profiling with the
+ -prof option, and probably one of the
+ -auto or -auto-all
+ options. These options are described in more detail in
+ -prof
+
+ -auto
+
+ -auto-all
+
+
+
+
+ Run your program with one of the profiling options, eg.
+ +RTS -p -RTS. This generates a file of
+ profiling information.
+ RTS
+ option
+
+
+
+ Examine the generated profiling information, using one of
+ GHC's profiling tools. The tool to use will depend on the kind
+ of profiling information generated.
+
+
+
+
+
+ Cost centres and cost-centre stacks
+
+ GHC's profiling system assigns costs
+ to cost centres. A cost is simply the time
+ or space required to evaluate an expression. Cost centres are
+ program annotations around expressions; all costs incurred by the
+ annotated expression are assigned to the enclosing cost centre.
+ Furthermore, GHC will remember the stack of enclosing cost centres
+ for any given expression at run-time and generate a call-graph of
+ cost attributions.
+
+ Let's take a look at an example:
+
+
+main = print (nfib 25)
+nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
+
+
+ Compile and run this program as follows:
+
+
+$ ghc -prof -auto-all -o Main Main.hs
+$ ./Main +RTS -p
+121393
+$
+
+
+ When a GHC-compiled program is run with the
+ RTS option, it generates a file called
+ <prog>.prof. In this case, the file
+ will contain something like this:
+
+
+ Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final)
+
+ Main +RTS -p -RTS
+
+ total time = 0.14 secs (7 ticks @ 20 ms)
+ total alloc = 8,741,204 bytes (excludes profiling overheads)
+
+COST CENTRE MODULE %time %alloc
+
+nfib Main 100.0 100.0
+
+
+ individual inherited
+COST CENTRE MODULE entries %time %alloc %time %alloc
+
+MAIN MAIN 0 0.0 0.0 100.0 100.0
+ main Main 0 0.0 0.0 0.0 0.0
+ CAF PrelHandle 3 0.0 0.0 0.0 0.0
+ CAF PrelAddr 1 0.0 0.0 0.0 0.0
+ CAF Main 6 0.0 0.0 100.0 100.0
+ main Main 1 0.0 0.0 100.0 100.0
+ nfib Main 242785 100.0 100.0 100.0 100.0
+
+
+
+ The first part of the file gives the program name and
+ options, and the total time and total memory allocation measured
+ during the run of the program (note that the total memory
+ allocation figure isn't the same as the amount of
+ live memory needed by the program at any one
+ time; the latter can be determined using heap profiling, which we
+ will describe shortly).
+
+ The second part of the file is a break-down by cost centre
+ of the most costly functions in the program. In this case, there
+ was only one significant function in the program, namely
+ nfib, and it was responsible for 100%
+ of both the time and allocation costs of the program.
+
+ The third and final section of the file gives a profile
+ break-down by cost-centre stack. This is roughly a call-graph
+ profile of the program. In the example above, it is clear that
+ the costly call to nfib came from
+ main.
+
+ The time and allocation incurred by a given part of the
+ program is displayed in two ways: “individual”, which
+ are the costs incurred by the code covered by this cost centre
+ stack alone, and “inherited”, which includes the costs
+ incurred by all the children of this node.
+
+ The usefulness of cost-centre stacks is better demonstrated
+ by modifying the example slightly:
+
+
+main = print (f 25 + g 25)
+f n = nfib n
+g n = nfib (n `div` 2)
+nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2)
+
+
+ Compile and run this program as before, and take a look at
+ the new profiling results:
+
+
+COST CENTRE MODULE scc %time %alloc %time %alloc
+
+MAIN MAIN 0 0.0 0.0 100.0 100.0
+ main Main 0 0.0 0.0 0.0 0.0
+ CAF PrelHandle 3 0.0 0.0 0.0 0.0
+ CAF PrelAddr 1 0.0 0.0 0.0 0.0
+ CAF Main 9 0.0 0.0 100.0 100.0
+ main Main 1 0.0 0.0 100.0 100.0
+ g Main 1 0.0 0.0 0.0 0.2
+ nfib Main 465 0.0 0.2 0.0 0.2
+ f Main 1 0.0 0.0 100.0 99.8
+ nfib Main 242785 100.0 99.8 100.0 99.8
+
+
+ Now although we had two calls to nfib
+ in the program, it is immediately clear that it was the call from
+ f which took all the time.
+
+ The actual meaning of the various columns in the output is:
+
+
+
+ entries
+
+ The number of times this particular point in the call
+ graph was entered.
+
+
+
+
+ individual %time
+
+ The percentage of the total run time of the program
+ spent at this point in the call graph.
+
+
+
+
+ individual %alloc
+
+ The percentage of the total memory allocations
+ (excluding profiling overheads) of the program made by this
+ call.
+
+
+
+
+ inherited %time
+
+ The percentage of the total run time of the program
+ spent below this point in the call graph.
+
+
+
+
+ inherited %alloc
+
+ The percentage of the total memory allocations
+ (excluding profiling overheads) of the program made by this
+ call and all of its sub-calls.
+
+
+
+
+ In addition you can use the RTS option
+ to
+ get the following additional information:
+
+
+
+ ticks
+
+ The raw number of time “ticks” which were
+ attributed to this cost-centre; from this, we get the
+ %time figure mentioned
+ above.
+
+
+
+
+ bytes
+
+ Number of bytes allocated in the heap while in this
+ cost-centre; again, this is the raw number from which we get
+ the %alloc figure mentioned
+ above.
+
+
+
+
+ What about recursive functions, and mutually recursive
+ groups of functions? Where are the costs attributed? Well,
+ although GHC does keep information about which groups of functions
+ called each other recursively, this information isn't displayed in
+ the basic time and allocation profile, instead the call-graph is
+ flattened into a tree. The XML profiling tool (described in ) will be able to display real loops in
+ the call-graph.
+
+ Inserting cost centres by hand
+
+ Cost centres are just program annotations. When you say
+ to the compiler, it automatically
+ inserts a cost centre annotation around every top-level function
+ in your program, but you are entirely free to add the cost
+ centre annotations yourself.
+
+ The syntax of a cost centre annotation is
+
+
+ {-# SCC "name" #-} <expression>
+
+
+ where "name" is an aribrary string,
+ that will become the name of your cost centre as it appears
+ in the profiling output, and
+ <expression> is any Haskell
+ expression. An SCC annotation extends as
+ far to the right as possible when parsing.
+
+
+
+
+ Rules for attributing costs
+
+ The cost of evaluating any expression in your program is
+ attributed to a cost-centre stack using the following rules:
+
+
+
+ If the expression is part of the
+ one-off costs of evaluating the
+ enclosing top-level definition, then costs are attributed to
+ the stack of lexically enclosing SCC
+ annotations on top of the special CAF
+ cost-centre.
+
+
+
+ Otherwise, costs are attributed to the stack of
+ lexically-enclosing SCC annotations,
+ appended to the cost-centre stack in effect at the
+ call site of the current top-level
+ definitionThe call-site is just the place
+ in the source code which mentions the particular function or
+ variable.. Notice that this is a recursive
+ definition.
+
+
+
+ Time spent in foreign code (see )
+ is always attributed to the cost centre in force at the
+ Haskell call-site of the foreign function.
+
+
+
+ What do we mean by one-off costs? Well, Haskell is a lazy
+ language, and certain expressions are only ever evaluated once.
+ For example, if we write:
+
+
+x = nfib 25
+
+
+ then x will only be evaluated once (if
+ at all), and subsequent demands for x will
+ immediately get to see the cached result. The definition
+ x is called a CAF (Constant Applicative
+ Form), because it has no arguments.
+
+ For the purposes of profiling, we say that the expression
+ nfib 25 belongs to the one-off costs of
+ evaluating x.
+
+ Since one-off costs aren't strictly speaking part of the
+ call-graph of the program, they are attributed to a special
+ top-level cost centre, CAF. There may be one
+ CAF cost centre for each module (the
+ default), or one for each top-level definition with any one-off
+ costs (this behaviour can be selected by giving GHC the
+ flag).
+
+ -caf-all
+
+
+ If you think you have a weird profile, or the call-graph
+ doesn't look like you expect it to, feel free to send it (and
+ your program) to us at
+ glasgow-haskell-bugs@haskell.org.
+
+
+
+
+ Compiler options for profiling
+
+ profilingoptions
+ optionsfor profiling
+
+
+
+ :
+
+
+ To make use of the profiling system
+ all modules must be compiled and linked
+ with the option. Any
+ SCC annotations you've put in your source
+ will spring to life.
+
+ Without a option, your
+ SCCs are ignored; so you can compile
+ SCC-laden code without changing
+ it.
+
+
+
+
+ There are a few other profiling-related compilation options.
+ Use them in addition to
+ . These do not have to be used consistently
+ for all modules in a program.
+
+
+
+ :
+
+ cost centresautomatically inserting
+
+ GHC will automatically add
+ _scc_ constructs for all
+ top-level, exported functions.
+
+
+
+
+ :
+
+
+ All top-level functions,
+ exported or not, will be automatically
+ _scc_'d.
+
+
+
+
+ :
+
+
+ The costs of all CAFs in a module are usually
+ attributed to one “big” CAF cost-centre. With
+ this option, all CAFs get their own cost-centre. An
+ “if all else fails” option…
+
+
+
+
+ :
+
+
+ Ignore any _scc_
+ constructs, so a module which already has
+ _scc_s can be compiled
+ for profiling with the annotations ignored.
+
+
+
+
+
+
+
+
+ Time and allocation profiling
+
+ To generate a time and allocation profile, give one of the
+ following RTS options to the compiled program when you run it (RTS
+ options should be enclosed between +RTS...-RTS
+ as usual):
+
+
+
+ or :
+
+
+ time profile
+
+ The option produces a standard
+ time profile report. It is written
+ into the file
+ program.prof.
+
+ The option produces a more
+ detailed report containing the actual time and allocation
+ data as well. (Not used much.)
+
+
+
+
+ :
+
+
+ The option generates profiling
+ information in the XML format understood by our new
+ profiling tool, see .
+
+
+
+
+
+ RTS
+ option
+
+ This option makes use of the extra information
+ maintained by the cost-centre-stack profiler to provide
+ useful information about the location of runtime errors.
+ See .
+
+
+
+
+
+
+
+
+ Profiling memory usage
+
+ In addition to profiling the time and allocation behaviour
+ of your program, you can also generate a graph of its memory usage
+ over time. This is useful for detecting the causes of
+ space leaks, when your program holds on to
+ more memory at run-time that it needs to. Space leaks lead to
+ longer run-times due to heavy garbage collector ativity, and may
+ even cause the program to run out of memory altogether.
+
+ To generate a heap profile from your program:
+
+
+
+ Compile the program for profiling ().
+
+
+ Run it with one of the heap profiling options described
+ below (eg. for a basic producer profile).
+ This generates the file
+ prog.hp.
+
+
+ Run hp2ps to produce a Postscript
+ file,
+ prog.ps. The
+ hp2ps utility is described in detail in
+ .
+
+
+ Display the heap profile using a postscript viewer such
+ as Ghostview, or print it out on a
+ Postscript-capable printer.
+
+
+
+
+ RTS options for heap profiling
+
+ There are several different kinds of heap profile that can
+ be generated. All the different profile types yield a graph of
+ live heap against time, but they differ in how the live heap is
+ broken down into bands. The following RTS options select which
+ break-down to use:
+
+
+
+
+ RTS
+ option
+
+ Breaks down the graph by the cost-centre stack which
+ produced the data.
+
+
+
+
+
+ RTS
+ option
+
+ Break down the live heap by the module containing
+ the code which produced the data.
+
+
+
+
+
+ RTS
+ option
+
+ Breaks down the graph by closure
+ description. For actual data, the description
+ is just the constructor name, for other closures it is a
+ compiler-generated string identifying the closure.
+
+
+
+
+
+ RTS
+ option
+
+ Breaks down the graph by
+ type. For closures which have
+ function type or unknown/polymorphic type, the string will
+ represent an approximation to the actual type.
+
+
+
+
+
+ RTS
+ option
+
+ Break down the graph by retainer
+ set. Retainer profiling is described in more
+ detail below ().
+
+
+
+
+
+ RTS
+ option
+
+ Break down the graph by
+ biography. Biographical profiling
+ is described in more detail below ().
+
+
+
+
+ In addition, the profile can be restricted to heap data
+ which satisfies certain criteria - for example, you might want
+ to display a profile by type but only for data produced by a
+ certain module, or a profile by retainer for a certain type of
+ data. Restrictions are specified as follows:
+
+
+
+ name,...
+ RTS
+ option
+
+ Restrict the profile to closures produced by
+ cost-centre stacks with one of the specified cost centres
+ at the top.
+
+
+
+
+ name,...
+ RTS
+ option
+
+ Restrict the profile to closures produced by
+ cost-centre stacks with one of the specified cost centres
+ anywhere in the stack.
+
+
+
+
+ module,...
+ RTS
+ option
+
+ Restrict the profile to closures produced by the
+ specified modules.
+
+
+
+
+ desc,...
+ RTS
+ option
+
+ Restrict the profile to closures with the specified
+ description strings.
+
+
+
+
+ type,...
+ RTS
+ option
+
+ Restrict the profile to closures with the specified
+ types.
+
+
+
+
+ cc,...
+ RTS
+ option
+
+ Restrict the profile to closures with retainer sets
+ containing cost-centre stacks with one of the specified
+ cost centres at the top.
+
+
+
+
+ bio,...
+ RTS
+ option
+
+ Restrict the profile to closures with one of the
+ specified biographies, where
+ bio is one of
+ lag, drag,
+ void, or use.
+
+
+
+
+ For example, the following options will generate a
+ retainer profile restricted to Branch and
+ Leaf constructors:
+
+
+prog +RTS -hr -hdBranch,Leaf
+
+
+ There can only be one "break-down" option
+ (eg. in the example above), but there is no
+ limit on the number of further restrictions that may be applied.
+ All the options may be combined, with one exception: GHC doesn't
+ currently support mixing the and
+ options.
+
+ There's one more option which relates to heap
+ profiling:
+
+
+
+ :
+
+
+ Set the profiling (sampling) interval to
+ secs seconds (the default is
+ 0.1 second). Fractions are allowed: for example
+ will get 5 samples per second.
+ This only affects heap profiling; time profiles are always
+ sampled on a 1/50 second frequency.
+
+
+
+
+
+
+
+ Retainer Profiling
+
+ Retainer profiling is designed to help answer questions
+ like why is this data being retained?. We start
+ by defining what we mean by a retainer:
+
+
+ A retainer is either the system stack, or an unevaluated
+ closure (thunk).
+
+
+ In particular, constructors are not
+ retainers.
+
+ An object A is retained by an object B if object A can be
+ reached by recursively following pointers starting from object
+ B but not meeting any other retainers on the way. Each object
+ has one or more retainers, collectively called its
+ retainer set.
+
+ When retainer profiling is requested by giving the program
+ the option, a graph is generated which is
+ broken down by retainer set. A retainer set is displayed as a
+ set of cost-centre stacks; because this is usually too large to
+ fit on the profile graph, each retainer set is numbered and
+ shown abbreviated on the graph along with its number, and the
+ full list of retainer sets is dumped into the file
+ prog.prof.
+
+ Retainer profiling requires multiple passes over the live
+ heap in order to discover the full retainer set for each
+ object, which can be quite slow. So we set a limit on the
+ maximum size of a retainer set, where all retainer sets larger
+ than the maximum retainer set size are replaced by the special
+ set MANY. The maximum set size defaults to 8
+ and can be altered with the RTS
+ option:
+
+
+
+ size
+
+ Restrict the number of elements in a retainer set to
+ size (default 8).
+
+
+
+
+
+ Hints for using retainer profiling
+
+ The definition of retainers is designed to reflect a
+ common cause of space leaks: a large structure is retained by
+ an unevaluated computation, and will be released once the
+ compuation is forced. A good example is looking up a value in
+ a finite map, where unless the lookup is forced in a timely
+ manner the unevaluated lookup will cause the whole mapping to
+ be retained. These kind of space leaks can often be
+ eliminated by forcing the relevant computations to be
+ performed eagerly, using seq or strictness
+ annotations on data constructor fields.
+
+ Often a particular data structure is being retained by a
+ chain of unevaluated closures, only the nearest of which will
+ be reported by retainer profiling - for example A retains B, B
+ retains C, and C retains a large structure. There might be a
+ large number of Bs but only a single A, so A is really the one
+ we're interested in eliminating. However, retainer profiling
+ will in this case report B as the retainer of the large
+ structure. To move further up the chain of retainers, we can
+ ask for another retainer profile but this time restrict the
+ profile to B objects, so we get a profile of the retainers of
+ B:
+
+
+prog +RTS -hr -hcB
+
+
+ This trick isn't foolproof, because there might be other
+ B closures in the heap which aren't the retainers we are
+ interested in, but we've found this to be a useful technique
+ in most cases.
+
+
+
+
+ Biographical Profiling
+
+ A typical heap object may be in one of the following four
+ states at each point in its lifetime:
+
+
+
+ The lag stage, which is the
+ time between creation and the first use of the
+ object,
+
+
+ the use stage, which lasts from
+ the first use until the last use of the object, and
+
+
+ The drag stage, which lasts
+ from the final use until the last reference to the object
+ is dropped.
+
+
+ An object which is never used is said to be in the
+ void state for its whole
+ lifetime.
+
+
+
+ A biographical heap profile displays the portion of the
+ live heap in each of the four states listed above. Usually the
+ most interesting states are the void and drag states: live heap
+ in these states is more likely to be wasted space than heap in
+ the lag or use states.
+
+ It is also possible to break down the heap in one or more
+ of these states by a different criteria, by restricting a
+ profile by biography. For example, to show the portion of the
+ heap in the drag or void state by producer:
+
+
+prog +RTS -hc -hbdrag,void
+
+
+ Once you know the producer or the type of the heap in the
+ drag or void states, the next step is usually to find the
+ retainer(s):
+
+
+prog +RTS -hr -hccc...
+
+
+ NOTE: this two stage process is required because GHC
+ cannot currently profile using both biographical and retainer
+ information simultaneously.
+
+
+
+
+
+ Graphical time/allocation profile
+
+ You can view the time and allocation profiling graph of your
+ program graphically, using ghcprof. This is a
+ new tool with GHC 4.08, and will eventually be the de-facto
+ standard way of viewing GHC profilesActually this
+ isn't true any more, we are working on a new tool for
+ displaying heap profiles using Gtk+HS, so
+ ghcprof may go away at some point in the future.
+
+
+ To run ghcprof, you need
+ daVinci installed, which can be
+ obtained from The Graph
+ Visualisation Tool daVinci. Install one of
+ the binary
+ distributionsdaVinci is
+ sadly not open-source :-(., and set your
+ DAVINCIHOME environment variable to point to the
+ installation directory.
+
+ ghcprof uses an XML-based profiling log
+ format, and you therefore need to run your program with a
+ different option: . The file generated is
+ still called <prog>.prof. To see the
+ profile, run ghcprof like this:
+
+
+
+
+$ ghcprof <prog>.prof
+
+
+ which should pop up a window showing the call-graph of your
+ program in glorious detail. More information on using
+ ghcprof can be found at The
+ Cost-Centre Stack Profiling Tool for
+ GHC.
+
+
+
+
+ hp2ps––heap profile to PostScript
+
+ hp2ps
+ heap profiles
+ postscript, from heap profiles
+
+
+ Usage:
+
+
+hp2ps [flags] [<file>[.hp]]
+
+
+ The program
+ hp2pshp2ps
+ program converts a heap profile as produced
+ by the runtime option into a
+ PostScript graph of the heap profile. By convention, the file to
+ be processed by hp2ps has a
+ .hp extension. The PostScript output is
+ written to <file>@.ps. If
+ <file> is omitted entirely, then the
+ program behaves as a filter.
+
+ hp2ps is distributed in
+ ghc/utils/hp2ps in a GHC source
+ distribution. It was originally developed by Dave Wakeling as part
+ of the HBC/LML heap profiler.
+
+ The flags are:
+
+
+
+
+
+
+ In order to make graphs more readable,
+ hp2ps sorts the shaded bands for each
+ identifier. The default sort ordering is for the bands with
+ the largest area to be stacked on top of the smaller ones.
+ The option causes rougher bands (those
+ representing series of values with the largest standard
+ deviations) to be stacked on top of smoother ones.
+
+
+
+
+
+
+ Normally, hp2ps puts the title of
+ the graph in a small box at the top of the page. However, if
+ the JOB string is too long to fit in a small box (more than
+ 35 characters), then hp2ps will choose to
+ use a big box instead. The option
+ forces hp2ps to use a big box.
+
+
+
+
+
+
+ Generate encapsulated PostScript suitable for
+ inclusion in LaTeX documents. Usually, the PostScript graph
+ is drawn in landscape mode in an area 9 inches wide by 6
+ inches high, and hp2ps arranges for this
+ area to be approximately centred on a sheet of a4 paper.
+ This format is convenient of studying the graph in detail,
+ but it is unsuitable for inclusion in LaTeX documents. The
+ option causes the graph to be drawn in
+ portrait mode, with float specifying the width in inches,
+ millimetres or points (the default). The resulting
+ PostScript file conforms to the Encapsulated PostScript
+ (EPS) convention, and it can be included in a LaTeX document
+ using Rokicki's dvi-to-PostScript converter
+ dvips.
+
+
+
+
+
+
+ Create output suitable for the gs
+ PostScript previewer (or similar). In this case the graph is
+ printed in portrait mode without scaling. The output is
+ unsuitable for a laser printer.
+
+
+
+
+
+
+ Normally a profile is limited to 20 bands with
+ additional identifiers being grouped into an
+ OTHER band. The flag
+ removes this 20 band and limit, producing as many bands as
+ necessary. No key is produced as it won't fit!. It is useful
+ for creation time profiles with many bands.
+
+
+
+
+
+
+ Normally a profile is limited to 20 bands with
+ additional identifiers being grouped into an
+ OTHER band. The flag
+ specifies an alternative band limit (the maximum is
+ 20).
+
+ requests the band limit to be
+ removed. As many bands as necessary are produced. However no
+ key is produced as it won't fit! It is useful for displaying
+ creation time profiles with many bands.
+
+
+
+
+
+
+ Use previous parameters. By default, the PostScript
+ graph is automatically scaled both horizontally and
+ vertically so that it fills the page. However, when
+ preparing a series of graphs for use in a presentation, it
+ is often useful to draw a new graph using the same scale,
+ shading and ordering as a previous one. The
+ flag causes the graph to be drawn using
+ the parameters determined by a previous run of
+ hp2ps on file. These
+ are extracted from file@.aux.
+
+
+
+
+
+
+ Use a small box for the title.
+
+
+
+
+
+
+ Normally trace elements which sum to a total of less
+ than 1% of the profile are removed from the
+ profile. The option allows this
+ percentage to be modified (maximum 5%).
+
+ requests no trace elements to be
+ removed from the profile, ensuring that all the data will be
+ displayed.
+
+
+
+
+
+
+ Generate colour output.
+
+
+
+
+
+
+ Ignore marks.
+
+
+
+
+
+
+ Print out usage information.
+
+
+
+
+
+
+ Using “ticky-ticky” profiling (for implementors)
+ ticky-ticky profiling
+
+ (ToDo: document properly.)
+
+ It is possible to compile Glasgow Haskell programs so that
+ they will count lots and lots of interesting things, e.g., number
+ of updates, number of data constructors entered, etc., etc. We
+ call this “ticky-ticky”
+ profiling,ticky-ticky
+ profilingprofiling,
+ ticky-ticky because that's the sound a Sun4
+ makes when it is running up all those counters
+ (slowly).
+
+ Ticky-ticky profiling is mainly intended for implementors;
+ it is quite separate from the main “cost-centre”
+ profiling system, intended for all users everywhere.
+
+ To be able to use ticky-ticky profiling, you will need to
+ have built appropriate libraries and things when you made the
+ system. See “Customising what libraries to build,” in
+ the installation guide.
+
+ To get your compiled program to spit out the ticky-ticky
+ numbers, use a RTS
+ option-r RTS option.
+ See .
+
+ Compiling your program with the
+ switch yields an executable that performs these counts. Here is a
+ sample ticky-ticky statistics file, generated by the invocation
+ foo +RTS -rfoo.ticky.
+
+
foo +RTS -rfoo.ticky
@@ -983,30 +1181,33 @@ Total bytes copied during GC: 190096
0 GC_SEL_MAJOR_ctr
0 GC_FAILED_PROMOTION_ctr
47524 GC_WORDS_COPIED_ctr
-
-
-
-
-
-The formatting of the information above the row of asterisks is
-subject to change, but hopefully provides a useful human-readable
-summary. Below the asterisks all counters maintained by the
-ticky-ticky system are dumped, in a format intended to be
-machine-readable: zero or more spaces, an integer, a space, the
-counter name, and a newline.
-
-
-
-In fact, not all counters are necessarily dumped; compile- or
-run-time flags can render certain counters invalid. In this case,
-either the counter will simply not appear, or it will appear with a
-modified counter name, possibly along with an explanation for the
-omission (notice ENT_PERM_IND_ctr appears with an inserted !
-above). Software analysing this output should always check that it
-has the counters it expects. Also, beware: some of the counters can
-have large values!
-
-
-
-
-
+
+
+ The formatting of the information above the row of asterisks
+ is subject to change, but hopefully provides a useful
+ human-readable summary. Below the asterisks all
+ counters maintained by the ticky-ticky system are
+ dumped, in a format intended to be machine-readable: zero or more
+ spaces, an integer, a space, the counter name, and a newline.
+
+ In fact, not all counters are
+ necessarily dumped; compile- or run-time flags can render certain
+ counters invalid. In this case, either the counter will simply
+ not appear, or it will appear with a modified counter name,
+ possibly along with an explanation for the omission (notice
+ ENT_PERM_IND_ctr appears
+ with an inserted ! above). Software analysing
+ this output should always check that it has the counters it
+ expects. Also, beware: some of the counters can have
+ large values!
+
+
+
+
+
+