X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fusers_guide%2Fprofiling.sgml;h=662164545bf82bfae7f3b80d4b0f8f3ac3960cdd;hb=73641e01ee9dfbe83f8c6225c1f6ae2e7d621b63;hp=a0bd4f68b51f46de1e97bc9fb0f1947997722ff2;hpb=c7f8f1e62b555462f98c3f813440559116033a99;p=ghc-hetmet.git diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml index a0bd4f6..6621645 100644 --- a/ghc/docs/users_guide/profiling.sgml +++ b/ghc/docs/users_guide/profiling.sgml @@ -1,909 +1,1107 @@ - -Profiling - - - -profiling, with cost-centres -cost-centre profiling - - - -Glasgow Haskell comes with a time and space profiling system. Its -purpose is to help you improve your understanding of your program's -execution behaviour, so you can improve it. - - - -Any comments, suggestions and/or improvements you have are welcome. -Recommended ``profiling tricks'' would be especially cool! - - - -How to profile a Haskell program - - - -The GHC approach to profiling is very simple: annotate the expressions -you consider ``interesting'' with cost centre labels (strings); -so, for example, you might have: - - - - - -f x y - = let - output1 = _scc_ "Pass1" ( pass1 x ) - output2 = _scc_ "Pass2" ( pass2 output1 y ) - output3 = _scc_ "Pass3" ( pass3 (output2 `zip` [1 .. ]) ) - in concat output3 - - - - - -The costs of the evaluating the expressions bound to output1, -output2 and output3 will be attributed to the ``cost -centres'' Pass1, Pass2 and Pass3, respectively. - - - -The costs of evaluating other expressions, e.g., concat output4, -will be inherited by the scope which referenced the function f. - - - -You can put in cost-centres via _scc_ constructs by hand, as in the -example above. Perfectly cool. That's probably what you -would do if your program divided into obvious ``passes'' or -``phases'', or whatever. - - - -If your program is large or you have no clue what might be gobbling -all the time, you can get GHC to mark all functions with _scc_ -constructs, automagically. Add an -auto compilation flag to the -usual -prof option. - - - -Once you start homing in on the Guilty Suspects, you may well switch -from automagically-inserted cost-centres to a few well-chosen ones of -your own. - - - -To use profiling, you must compile and run with special -options. (We usually forget the ``run'' magic!—Do as we say, not as -we do…) Details follow. - - - -If you're serious about this profiling game, you should probably read -one or more of the Sansom/Peyton Jones papers about the GHC profiling -system. Just visit the Glasgow FP group web page… - - - - - -Compiling programs for profiling - - - -profiling options -options, for profiling - - - -To make use of the cost centre profiling system all modules must -be compiled and linked with the -prof option.-prof option -Any _scc_ constructs you've put in your source will spring to life. - - - -Without a -prof option, your _scc_s are ignored; so you can -compiled _scc_-laden code without changing it. - - - -There are a few other profiling-related compilation options. Use them -in addition to -prof. These do not have to be used -consistently for all modules in a program. - - - - - - --auto: - - --auto option -cost centres, automatically inserting -GHC will automatically add _scc_ constructs for -all top-level, exported functions. - - - - --auto-all: - - --auto-all option -All top-level functions, exported or not, will be automatically -_scc_'d. - - - - --caf-all: - - --caf-all option -The costs of all CAFs in a module are usually attributed to one -``big'' CAF cost-centre. With this option, all CAFs get their own cost-centre. -An ``if all else fails'' option… - - - - --ignore-scc: - - --ignore-scc option -Ignore any _scc_ constructs, -so a module which already has _scc_s can be -compiled for profiling with the annotations ignored. - - - - --G<group>: - - --G<group> option -Specifies the <group> to be attached to all the cost-centres -declared in the module. If no group is specified it defaults to the -module name. - - - - - - - -In addition to the -prof option your system might be setup to enable -you to compile and link with the -prof-details -prof-details -option option instead. This enables additional detailed counts -to be reported with the -P RTS option. - - - - - -How to control your profiled program at runtime - - - -profiling RTS options -RTS options, for profiling - - - -It isn't enough to compile your program for profiling with -prof! - - - -When you run your profiled program, you must tell the runtime -system (RTS) what you want to profile (e.g., time and/or space), and -how you wish the collected data to be reported. You also may wish to -set the sampling interval used in time profiling. - - - -Executive summary: ./a.out +RTS -pT produces a time profile in -a.out.prof; ./a.out +RTS -hC produces space-profiling -info which can be mangled by hp2ps and viewed with ghostview -(or equivalent). - - - -Profiling runtime flags are passed to your program between the usual -+RTS and -RTS options. - - - - - - --p<sort> or -P<sort>: - - --p<sort> RTS option (profiling) --P<sort> RTS option (profiling) -time profile -serial time profile -The -p? option produces a standard time profile report. -It is written into the file <program>@.prof. - - - -The -P? option produces a more detailed report containing the -actual time and allocation data as well. (Not used much.) - - - -The <sort> indicates how the cost centres are to be sorted in the -report. Valid <sort> options are: - - - -T: - - -by time, largest first (the default); - - - - -A: - - -by bytes allocated, largest first; - - - - -C: - - -alphabetically by group, module and cost centre. - - - - - - - - --i<secs>: - - --i<secs> RTS option -(profiling) Set the profiling (sampling) interval to <secs> -seconds (the default is 1 second). Fractions are allowed: for example --i0.2 will get 5 samples per second. - - - - --h<break-down>: - - --h<break-down> RTS option (profiling) -heap profile - - - -Produce a detailed space profile of the heap occupied by live -closures. The profile is written to the file <program>@.hp from -which a PostScript graph can be produced using hp2ps (see -). - - - -The heap space profile may be broken down by different criteria: - - - --hC: - - -cost centre which produced the closure (the default). - - - - --hM: - - -cost centre module which produced the closure. - - - - --hG: - - -cost centre group which produced the closure. - - - - --hD: - - -closure description—a string describing the closure. - - - - --hY: - - -closure type—a string describing the closure's type. - - - - -By default all live closures in the heap are profiled, but particular -closures of interest can be selected (see below). - - - - - - - -Heap (space) profiling uses hash tables. If these tables -should fill the run will abort. The --z<tbl><size>-z<tbl><size> RTS option (profiling) option is used to -increase the size of the relevant hash table (C, M, -G, D or Y, defined as for <break-down> above). The -actual size used is the next largest power of 2. - - - -The heap profile can be restricted to particular closures of interest. -The closures of interest can selected by the attached cost centre -(module:label, module and group), closure category (description, type, -and kind) using the following options: - - - - - - --c{<mod>:<lab>,<mod>:<lab>...}: - - --c{<lab> RTS option (profiling)} -Selects individual cost centre(s). - - - - --m{<mod>,<mod>...}: - - --m{<mod> RTS option (profiling)} -Selects all cost centres from the module(s) specified. - - - - --g{<grp>,<grp>...}: - - --g{<grp> RTS option (profiling)} -Selects all cost centres from the groups(s) specified. - - - - --d{<des>,<des>...}: - - --d{<des> RTS option (profiling)} -Selects closures which have one of the specified descriptions. - - - - --y{<typ>,<typ>...}: - - --y{<typ> RTS option (profiling)} -Selects closures which have one of the specified type descriptions. - - - - --k{<knd>,<knd>...}: - - --k{<knd> RTS option (profiling)} -Selects closures which are of one of the specified closure kinds. -Valid closure kinds are CON (constructor), FN (manifest -function), PAP (partial application), BH (black hole) and -THK (thunk). - - - - - - - -The space occupied by a closure will be reported in the heap profile -if the closure satisfies the following logical expression: - - - -([-c] or [-m] or [-g]) and ([-d] or [-y] or [-k]) - - - -where a particular option is true if the closure (or its attached cost -centre) is selected by the option (or the option is not specified). - - - - - -What's in a profiling report? - - - -profiling report, meaning thereof - - - -When you run your profiled program with the -p RTS option -p -RTS option, you get the following information about your ``cost -centres'': - - - - - - -COST CENTRE: - - -The cost-centre's name. - - - - -MODULE: - - -The module associated with the cost-centre; -important mostly if you have identically-named cost-centres in -different modules. - - - - -scc: - - -How many times this cost-centre was entered; think -of it as ``I got to the _scc_ construct this many times…'' - - - - -%time: - - -What part of the time was spent in this cost-centre (see also ``ticks,'' -below). - - - - -%alloc: - - -What part of the memory allocation was done in this cost-centre -(see also ``bytes,'' below). - - - - -inner: - - -How many times this cost-centre ``passed control'' to an inner -cost-centre; for example, scc=4 plus subscc=8 means -``This _scc_ was entered four times, but went out to -other _scc_s eight times.'' - - - - -cafs: - - -CAF, profiling -How many CAFs this cost centre evaluated. - - - - -dicts: - - -Dictionaries, profiling -How many dictionaries this cost centre evaluated. - - - - - - - -In addition you can use the -P RTS option to get the following additional information: - - - -ticks: - - -The raw number of time ``ticks'' which were -attributed to this cost-centre; from this, we get the %time -figure mentioned above. - - - - -bytes: - - -Number of bytes allocated in the heap while in -this cost-centre; again, this is the raw number from which we -get the %alloc figure mentioned above. - - - - - - - -Finally if you built your program with -prof-details - the -P RTS option will also -produce the following information: - - - -closures: - - -closures, profiling -How many heap objects were allocated; these objects may be of varying -size. If you divide the number of bytes (mentioned below) by this -number of ``closures'', then you will get the average object size. -(Not too interesting, but still…) - - - - -thunks: - - -thunks, profiling -How many times we entered (evaluated) a thunk—an unevaluated -object in the heap—while we were in this cost-centre. - - - - -funcs: - - -functions, profiling -How many times we entered (evaluated) a function while we we in this -cost-centre. (In Haskell, functions are first-class values and may be -passed as arguments, returned as results, evaluated, and generally -manipulated just like data values) - - - - -PAPs: - - -partial applications, profiling -How many times we entered (evaluated) a partial application (PAP), i.e., -a function applied to fewer arguments than it needs. For example, Int -addition applied to one argument would be a PAP. A PAP is really -just a particular form for a function. - - - - - - - - - -Producing graphical heap profiles - - - -heap profiles, producing - - - -Utility programs which produce graphical profiles. - - - -<Literal>hp2ps</Literal>--heap profile to PostScript - - - -hp2ps (utility) -heap profiles -PostScript, from heap profiles - - - -Usage: - - - - - -hp2ps [flags] [<file>[.stat]] - - - - - -The program hp2pshp2ps program converts a heap profile -as produced by the -h<break-down>-h<break-down> RTS -option runtime option into a PostScript graph of the heap -profile. By convention, the file to be processed by hp2ps has a -.hp extension. The PostScript output is written to <file>@.ps. If -<file> is omitted entirely, then the program behaves as a filter. - - - -hp2ps is distributed in ghc/utils/hp2ps in a GHC source -distribution. It was originally developed by Dave Wakeling as part of -the HBC/LML heap profiler. - - - -The flags are: - - - --d - - -In order to make graphs more readable, hp2ps sorts the shaded -bands for each identifier. The default sort ordering is for the bands -with the largest area to be stacked on top of the smaller ones. The --d option causes rougher bands (those representing series of -values with the largest standard deviations) to be stacked on top of -smoother ones. - - - - --b - - -Normally, hp2ps puts the title of the graph in a small box at the -top of the page. However, if the JOB string is too long to fit in a -small box (more than 35 characters), then -hp2ps will choose to use a big box instead. The -b -option forces hp2ps to use a big box. - - - - --e<float>[in|mm|pt] - - -Generate encapsulated PostScript suitable for inclusion in LaTeX -documents. Usually, the PostScript graph is drawn in landscape mode -in an area 9 inches wide by 6 inches high, and hp2ps arranges -for this area to be approximately centred on a sheet of a4 paper. -This format is convenient of studying the graph in detail, but it is -unsuitable for inclusion in LaTeX documents. The -e option -causes the graph to be drawn in portrait mode, with float specifying -the width in inches, millimetres or points (the default). The -resulting PostScript file conforms to the Encapsulated PostScript -(EPS) convention, and it can be included in a LaTeX document using -Rokicki's dvi-to-PostScript converter dvips. - - - - --g - - -Create output suitable for the gs PostScript previewer (or -similar). In this case the graph is printed in portrait mode without -scaling. The output is unsuitable for a laser printer. - - - - --l - - -Normally a profile is limited to 20 bands with additional identifiers -being grouped into an OTHER band. The -l flag removes this -20 band and limit, producing as many bands as necessary. No key is -produced as it won't fit!. It is useful for creation time profiles -with many bands. - - - - --m<int> - - -Normally a profile is limited to 20 bands with additional identifiers -being grouped into an OTHER band. The -m flag specifies an -alternative band limit (the maximum is 20). - - - --m0 requests the band limit to be removed. As many bands as -necessary are produced. However no key is produced as it won't fit! It -is useful for displaying creation time profiles with many bands. - - - - --p - - -Use previous parameters. By default, the PostScript graph is -automatically scaled both horizontally and vertically so that it fills -the page. However, when preparing a series of graphs for use in a -presentation, it is often useful to draw a new graph using the same -scale, shading and ordering as a previous one. The -p flag causes -the graph to be drawn using the parameters determined by a previous -run of hp2ps on file. These are extracted from -file@.aux. - - - - --s - - -Use a small box for the title. - - - - --t<float> - - -Normally trace elements which sum to a total of less than 1% of the -profile are removed from the profile. The -t option allows this -percentage to be modified (maximum 5%). - - - --t0 requests no trace elements to be removed from the profile, -ensuring that all the data will be displayed. - - - - --? - - -Print out usage information. - - - - - - - - - -<Literal>stat2resid</Literal>—residency info from GC stats - - - -stat2resid (utility) -GC stats—residency info -residency, from GC stats - - - -Usage: - - - - - -stat2resid [<file>[.stat] [<outfile>]] - - - - - -The program stat2residstat2resid converts a detailed -garbage collection statistics file produced by the --S-S RTS option runtime option into a PostScript heap -residency graph. The garbage collection statistics file can be -produced without compiling your program for profiling. - - - -By convention, the file to be processed by stat2resid has a -.stat extension. If the <outfile> is not specified the -PostScript will be written to <file>@.resid.ps. If -<file> is omitted entirely, then the program behaves as a filter. - - - -The plot can not be produced from the statistics file for a -generational collector, though a suitable stats file can be produced -using the -F2s-F2s RTS option runtime option when the -program has been compiled for generational garbage collection (the -default). - - - -stat2resid is distributed in ghc/utils/stat2resid in a GHC source -distribution. - - - - - - - -Using ``ticky-ticky'' profiling (for implementors) - - - -ticky-ticky profiling (implementors) - - - -(ToDo: document properly.) - - - -It is possible to compile Glasgow Haskell programs so that they will -count lots and lots of interesting things, e.g., number of updates, -number of data constructors entered, etc., etc. We call this -``ticky-ticky'' profiling,ticky-ticky profiling -profiling, ticky-ticky because that's the sound a Sun4 makes -when it is running up all those counters (slowly). - - - -Ticky-ticky profiling is mainly intended for implementors; it is quite -separate from the main ``cost-centre'' profiling system, intended for -all users everywhere. - - - -To be able to use ticky-ticky profiling, you will need to have built -appropriate libraries and things when you made the system. See -``Customising what libraries to build,'' in the installation guide. - - - -To get your compiled program to spit out the ticky-ticky numbers, use -a -r RTS option-r RTS option. See . - - - -Compiling your program with the -ticky switch yields an executable -that performs these counts. Here is a sample ticky-ticky statistics -file, generated by the invocation foo +RTS -rfoo.ticky. - - - - - + + Profiling + profiling + + cost-centre profiling + + Glasgow Haskell comes with a time and space profiling + system. Its purpose is to help you improve your understanding of + your program's execution behaviour, so you can improve it. + + Any comments, suggestions and/or improvements you have are + welcome. Recommended “profiling tricks” would be + especially cool! + + Profiling a program is a three-step process: + + + + Re-compile your program for profiling with the + -prof option, and probably one of the + -auto or -auto-all + options. These options are described in more detail in + -prof + + -auto + + -auto-all + + + + + Run your program with one of the profiling options, eg. + +RTS -p -RTS. This generates a file of + profiling information. + RTS + option + + + + Examine the generated profiling information, using one of + GHC's profiling tools. The tool to use will depend on the kind + of profiling information generated. + + + + + + Cost centres and cost-centre stacks + + GHC's profiling system assigns costs + to cost centres. A cost is simply the time + or space required to evaluate an expression. Cost centres are + program annotations around expressions; all costs incurred by the + annotated expression are assigned to the enclosing cost centre. + Furthermore, GHC will remember the stack of enclosing cost centres + for any given expression at run-time and generate a call-graph of + cost attributions. + + Let's take a look at an example: + + +main = print (nfib 25) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) + + + Compile and run this program as follows: + + +$ ghc -prof -auto-all -o Main Main.hs +$ ./Main +RTS -p +121393 +$ + + + When a GHC-compiled program is run with the + RTS option, it generates a file called + <prog>.prof. In this case, the file + will contain something like this: + + + Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final) + + Main +RTS -p -RTS + + total time = 0.14 secs (7 ticks @ 20 ms) + total alloc = 8,741,204 bytes (excludes profiling overheads) + +COST CENTRE MODULE %time %alloc + +nfib Main 100.0 100.0 + + + individual inherited +COST CENTRE MODULE entries %time %alloc %time %alloc + +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 6 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + nfib Main 242785 100.0 100.0 100.0 100.0 + + + + The first part of the file gives the program name and + options, and the total time and total memory allocation measured + during the run of the program (note that the total memory + allocation figure isn't the same as the amount of + live memory needed by the program at any one + time; the latter can be determined using heap profiling, which we + will describe shortly). + + The second part of the file is a break-down by cost centre + of the most costly functions in the program. In this case, there + was only one significant function in the program, namely + nfib, and it was responsible for 100% + of both the time and allocation costs of the program. + + The third and final section of the file gives a profile + break-down by cost-centre stack. This is roughly a call-graph + profile of the program. In the example above, it is clear that + the costly call to nfib came from + main. + + The time and allocation incurred by a given part of the + program is displayed in two ways: “individual”, which + are the costs incurred by the code covered by this cost centre + stack alone, and “inherited”, which includes the costs + incurred by all the children of this node. + + The usefulness of cost-centre stacks is better demonstrated + by modifying the example slightly: + + +main = print (f 25 + g 25) +f n = nfib n +g n = nfib (n `div` 2) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) + + + Compile and run this program as before, and take a look at + the new profiling results: + + +COST CENTRE MODULE scc %time %alloc %time %alloc + +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 9 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + g Main 1 0.0 0.0 0.0 0.2 + nfib Main 465 0.0 0.2 0.0 0.2 + f Main 1 0.0 0.0 100.0 99.8 + nfib Main 242785 100.0 99.8 100.0 99.8 + + + Now although we had two calls to nfib + in the program, it is immediately clear that it was the call from + f which took all the time. + + The actual meaning of the various columns in the output is: + + + + entries + + The number of times this particular point in the call + graph was entered. + + + + + individual %time + + The percentage of the total run time of the program + spent at this point in the call graph. + + + + + individual %alloc + + The percentage of the total memory allocations + (excluding profiling overheads) of the program made by this + call. + + + + + inherited %time + + The percentage of the total run time of the program + spent below this point in the call graph. + + + + + inherited %alloc + + The percentage of the total memory allocations + (excluding profiling overheads) of the program made by this + call and all of its sub-calls. + + + + + In addition you can use the RTS option + to + get the following additional information: + + + + ticks + + The raw number of time “ticks” which were + attributed to this cost-centre; from this, we get the + %time figure mentioned + above. + + + + + bytes + + Number of bytes allocated in the heap while in this + cost-centre; again, this is the raw number from which we get + the %alloc figure mentioned + above. + + + + + What about recursive functions, and mutually recursive + groups of functions? Where are the costs attributed? Well, + although GHC does keep information about which groups of functions + called each other recursively, this information isn't displayed in + the basic time and allocation profile, instead the call-graph is + flattened into a tree. The XML profiling tool (described in ) will be able to display real loops in + the call-graph. + + Inserting cost centres by hand + + Cost centres are just program annotations. When you say + to the compiler, it automatically + inserts a cost centre annotation around every top-level function + in your program, but you are entirely free to add the cost + centre annotations yourself. + + The syntax of a cost centre annotation is + + + {-# SCC "name" #-} <expression> + + + where "name" is an aribrary string, + that will become the name of your cost centre as it appears + in the profiling output, and + <expression> is any Haskell + expression. An SCC annotation extends as + far to the right as possible when parsing. + + + + + Rules for attributing costs + + The cost of evaluating any expression in your program is + attributed to a cost-centre stack using the following rules: + + + + If the expression is part of the + one-off costs of evaluating the + enclosing top-level definition, then costs are attributed to + the stack of lexically enclosing SCC + annotations on top of the special CAF + cost-centre. + + + + Otherwise, costs are attributed to the stack of + lexically-enclosing SCC annotations, + appended to the cost-centre stack in effect at the + call site of the current top-level + definition The call-site is just the place + in the source code which mentions the particular function or + variable.. Notice that this is a recursive + definition. + + + + Time spent in foreign code (see ) + is always attributed to the cost centre in force at the + Haskell call-site of the foreign function. + + + + What do we mean by one-off costs? Well, Haskell is a lazy + language, and certain expressions are only ever evaluated once. + For example, if we write: + + +x = nfib 25 + + + then x will only be evaluated once (if + at all), and subsequent demands for x will + immediately get to see the cached result. The definition + x is called a CAF (Constant Applicative + Form), because it has no arguments. + + For the purposes of profiling, we say that the expression + nfib 25 belongs to the one-off costs of + evaluating x. + + Since one-off costs aren't strictly speaking part of the + call-graph of the program, they are attributed to a special + top-level cost centre, CAF. There may be one + CAF cost centre for each module (the + default), or one for each top-level definition with any one-off + costs (this behaviour can be selected by giving GHC the + flag). + + -caf-all + + + If you think you have a weird profile, or the call-graph + doesn't look like you expect it to, feel free to send it (and + your program) to us at + glasgow-haskell-bugs@haskell.org. + + + + + Compiler options for profiling + + profilingoptions + optionsfor profiling + + + + : + + + To make use of the profiling system + all modules must be compiled and linked + with the option. Any + SCC annotations you've put in your source + will spring to life. + + Without a option, your + SCCs are ignored; so you can compile + SCC-laden code without changing + it. + + + + + There are a few other profiling-related compilation options. + Use them in addition to + . These do not have to be used consistently + for all modules in a program. + + + + : + + cost centresautomatically inserting + + GHC will automatically add + _scc_ constructs for all + top-level, exported functions. + + + + + : + + + All top-level functions, + exported or not, will be automatically + _scc_'d. + + + + + : + + + The costs of all CAFs in a module are usually + attributed to one “big” CAF cost-centre. With + this option, all CAFs get their own cost-centre. An + “if all else fails” option… + + + + + : + + + Ignore any _scc_ + constructs, so a module which already has + _scc_s can be compiled + for profiling with the annotations ignored. + + + + + + + + + Time and allocation profiling + + To generate a time and allocation profile, give one of the + following RTS options to the compiled program when you run it (RTS + options should be enclosed between +RTS...-RTS + as usual): + + + + or : + + + time profile + + The option produces a standard + time profile report. It is written + into the file + program.prof. + + The option produces a more + detailed report containing the actual time and allocation + data as well. (Not used much.) + + + + + : + + + The option generates profiling + information in the XML format understood by our new + profiling tool, see . + + + + + + RTS + option + + This option makes use of the extra information + maintained by the cost-centre-stack profiler to provide + useful information about the location of runtime errors. + See . + + + + + + + + + Profiling memory usage + + In addition to profiling the time and allocation behaviour + of your program, you can also generate a graph of its memory usage + over time. This is useful for detecting the causes of + space leaks, when your program holds on to + more memory at run-time that it needs to. Space leaks lead to + longer run-times due to heavy garbage collector ativity, and may + even cause the program to run out of memory altogether. + + To generate a heap profile from your program: + + + + Compile the program for profiling (). + + + Run it with one of the heap profiling options described + below (eg. for a basic producer profile). + This generates the file + prog.hp. + + + Run hp2ps to produce a Postscript + file, + prog.ps. The + hp2ps utility is described in detail in + . + + + Display the heap profile using a postscript viewer such + as Ghostview, or print it out on a + Postscript-capable printer. + + + + + RTS options for heap profiling + + There are several different kinds of heap profile that can + be generated. All the different profile types yield a graph of + live heap against time, but they differ in how the live heap is + broken down into bands. The following RTS options select which + break-down to use: + + + + + RTS + option + + Breaks down the graph by the cost-centre stack which + produced the data. + + + + + + RTS + option + + Break down the live heap by the module containing + the code which produced the data. + + + + + + RTS + option + + Breaks down the graph by closure + description. For actual data, the description + is just the constructor name, for other closures it is a + compiler-generated string identifying the closure. + + + + + + RTS + option + + Breaks down the graph by + type. For closures which have + function type or unknown/polymorphic type, the string will + represent an approximation to the actual type. + + + + + + RTS + option + + Break down the graph by retainer + set. Retainer profiling is described in more + detail below (). + + + + + + RTS + option + + Break down the graph by + biography. Biographical profiling + is described in more detail below (). + + + + + In addition, the profile can be restricted to heap data + which satisfies certain criteria - for example, you might want + to display a profile by type but only for data produced by a + certain module, or a profile by retainer for a certain type of + data. Restrictions are specified as follows: + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + at the top. + + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + anywhere in the stack. + + + + + module,... + RTS + option + + Restrict the profile to closures produced by the + specified modules. + + + + + desc,... + RTS + option + + Restrict the profile to closures with the specified + description strings. + + + + + type,... + RTS + option + + Restrict the profile to closures with the specified + types. + + + + + cc,... + RTS + option + + Restrict the profile to closures with retainer sets + containing cost-centre stacks with one of the specified + cost centres at the top. + + + + + bio,... + RTS + option + + Restrict the profile to closures with one of the + specified biographies, where + bio is one of + lag, drag, + void, or use. + + + + + For example, the following options will generate a + retainer profile restricted to Branch and + Leaf constructors: + + +prog +RTS -hr -hdBranch,Leaf + + + There can only be one "break-down" option + (eg. in the example above), but there is no + limit on the number of further restrictions that may be applied. + All the options may be combined, with one exception: GHC doesn't + currently support mixing the and + options. + + There's one more option which relates to heap + profiling: + + + + : + + + Set the profiling (sampling) interval to + secs seconds (the default is + 0.1 second). Fractions are allowed: for example + will get 5 samples per second. + This only affects heap profiling; time profiles are always + sampled on a 1/50 second frequency. + + + + + + + + Retainer Profiling + + Retainer profiling is designed to help answer questions + like why is this data being retained?. We start + by defining what we mean by a retainer: + +
+ A retainer is either the system stack, or an unevaluated + closure (thunk). +
+ + In particular, constructors are not + retainers. + + An object A is retained by an object B if object A can be + reached by recursively following pointers starting from object + B but not meeting any other retainers on the way. Each object + has one or more retainers, collectively called its + retainer set. + + When retainer profiling is requested by giving the program + the option, a graph is generated which is + broken down by retainer set. A retainer set is displayed as a + set of cost-centre stacks; because this is usually too large to + fit on the profile graph, each retainer set is numbered and + shown abbreviated on the graph along with its number, and the + full list of retainer sets is dumped into the file + prog.prof. + + Retainer profiling requires multiple passes over the live + heap in order to discover the full retainer set for each + object, which can be quite slow. So we set a limit on the + maximum size of a retainer set, where all retainer sets larger + than the maximum retainer set size are replaced by the special + set MANY. The maximum set size defaults to 8 + and can be altered with the RTS + option: + + + + size + + Restrict the number of elements in a retainer set to + size (default 8). + + + + + + Hints for using retainer profiling + + The definition of retainers is designed to reflect a + common cause of space leaks: a large structure is retained by + an unevaluated computation, and will be released once the + compuation is forced. A good example is looking up a value in + a finite map, where unless the lookup is forced in a timely + manner the unevaluated lookup will cause the whole mapping to + be retained. These kind of space leaks can often be + eliminated by forcing the relevant computations to be + performed eagerly, using seq or strictness + annotations on data constructor fields. + + Often a particular data structure is being retained by a + chain of unevaluated closures, only the nearest of which will + be reported by retainer profiling - for example A retains B, B + retains C, and C retains a large structure. There might be a + large number of Bs but only a single A, so A is really the one + we're interested in eliminating. However, retainer profiling + will in this case report B as the retainer of the large + structure. To move further up the chain of retainers, we can + ask for another retainer profile but this time restrict the + profile to B objects, so we get a profile of the retainers of + B: + + +prog +RTS -hr -hcB + + + This trick isn't foolproof, because there might be other + B closures in the heap which aren't the retainers we are + interested in, but we've found this to be a useful technique + in most cases. + +
+ + + Biographical Profiling + + A typical heap object may be in one of the following four + states at each point in its lifetime: + + + + The lag stage, which is the + time between creation and the first use of the + object, + + + the use stage, which lasts from + the first use until the last use of the object, and + + + The drag stage, which lasts + from the final use until the last reference to the object + is dropped. + + + An object which is never used is said to be in the + void state for its whole + lifetime. + + + + A biographical heap profile displays the portion of the + live heap in each of the four states listed above. Usually the + most interesting states are the void and drag states: live heap + in these states is more likely to be wasted space than heap in + the lag or use states. + + It is also possible to break down the heap in one or more + of these states by a different criteria, by restricting a + profile by biography. For example, to show the portion of the + heap in the drag or void state by producer: + + +prog +RTS -hc -hbdrag,void + + + Once you know the producer or the type of the heap in the + drag or void states, the next step is usually to find the + retainer(s): + + +prog +RTS -hr -hccc... + + + NOTE: this two stage process is required because GHC + cannot currently profile using both biographical and retainer + information simultaneously. + + +
+ + + Graphical time/allocation profile + + You can view the time and allocation profiling graph of your + program graphically, using ghcprof. This is a + new tool with GHC 4.08, and will eventually be the de-facto + standard way of viewing GHC profilesActually this + isn't true any more, we are working on a new tool for + displaying heap profiles using Gtk+HS, so + ghcprof may go away at some point in the future. + + + To run ghcprof, you need + daVinci installed, which can be + obtained from The Graph + Visualisation Tool daVinci. Install one of + the binary + distributionsdaVinci is + sadly not open-source :-(., and set your + DAVINCIHOME environment variable to point to the + installation directory. + + ghcprof uses an XML-based profiling log + format, and you therefore need to run your program with a + different option: . The file generated is + still called <prog>.prof. To see the + profile, run ghcprof like this: + + + + +$ ghcprof <prog>.prof + + + which should pop up a window showing the call-graph of your + program in glorious detail. More information on using + ghcprof can be found at The + Cost-Centre Stack Profiling Tool for + GHC. + + + + + <command>hp2ps</command>––heap profile to PostScript + + hp2ps + heap profiles + postscript, from heap profiles + + + Usage: + + +hp2ps [flags] [<file>[.hp]] + + + The program + hp2pshp2ps + program converts a heap profile as produced + by the runtime option into a + PostScript graph of the heap profile. By convention, the file to + be processed by hp2ps has a + .hp extension. The PostScript output is + written to <file>@.ps. If + <file> is omitted entirely, then the + program behaves as a filter. + + hp2ps is distributed in + ghc/utils/hp2ps in a GHC source + distribution. It was originally developed by Dave Wakeling as part + of the HBC/LML heap profiler. + + The flags are: + + + + + + + In order to make graphs more readable, + hp2ps sorts the shaded bands for each + identifier. The default sort ordering is for the bands with + the largest area to be stacked on top of the smaller ones. + The option causes rougher bands (those + representing series of values with the largest standard + deviations) to be stacked on top of smoother ones. + + + + + + + Normally, hp2ps puts the title of + the graph in a small box at the top of the page. However, if + the JOB string is too long to fit in a small box (more than + 35 characters), then hp2ps will choose to + use a big box instead. The option + forces hp2ps to use a big box. + + + + + + + Generate encapsulated PostScript suitable for + inclusion in LaTeX documents. Usually, the PostScript graph + is drawn in landscape mode in an area 9 inches wide by 6 + inches high, and hp2ps arranges for this + area to be approximately centred on a sheet of a4 paper. + This format is convenient of studying the graph in detail, + but it is unsuitable for inclusion in LaTeX documents. The + option causes the graph to be drawn in + portrait mode, with float specifying the width in inches, + millimetres or points (the default). The resulting + PostScript file conforms to the Encapsulated PostScript + (EPS) convention, and it can be included in a LaTeX document + using Rokicki's dvi-to-PostScript converter + dvips. + + + + + + + Create output suitable for the gs + PostScript previewer (or similar). In this case the graph is + printed in portrait mode without scaling. The output is + unsuitable for a laser printer. + + + + + + + Normally a profile is limited to 20 bands with + additional identifiers being grouped into an + OTHER band. The flag + removes this 20 band and limit, producing as many bands as + necessary. No key is produced as it won't fit!. It is useful + for creation time profiles with many bands. + + + + + + + Normally a profile is limited to 20 bands with + additional identifiers being grouped into an + OTHER band. The flag + specifies an alternative band limit (the maximum is + 20). + + requests the band limit to be + removed. As many bands as necessary are produced. However no + key is produced as it won't fit! It is useful for displaying + creation time profiles with many bands. + + + + + + + Use previous parameters. By default, the PostScript + graph is automatically scaled both horizontally and + vertically so that it fills the page. However, when + preparing a series of graphs for use in a presentation, it + is often useful to draw a new graph using the same scale, + shading and ordering as a previous one. The + flag causes the graph to be drawn using + the parameters determined by a previous run of + hp2ps on file. These + are extracted from file@.aux. + + + + + + + Use a small box for the title. + + + + + + + Normally trace elements which sum to a total of less + than 1% of the profile are removed from the + profile. The option allows this + percentage to be modified (maximum 5%). + + requests no trace elements to be + removed from the profile, ensuring that all the data will be + displayed. + + + + + + + Generate colour output. + + + + + + + Ignore marks. + + + + + + + Print out usage information. + + + + + + + Using “ticky-ticky” profiling (for implementors) + ticky-ticky profiling + + (ToDo: document properly.) + + It is possible to compile Glasgow Haskell programs so that + they will count lots and lots of interesting things, e.g., number + of updates, number of data constructors entered, etc., etc. We + call this “ticky-ticky” + profiling,ticky-ticky + profiling profiling, + ticky-ticky because that's the sound a Sun4 + makes when it is running up all those counters + (slowly). + + Ticky-ticky profiling is mainly intended for implementors; + it is quite separate from the main “cost-centre” + profiling system, intended for all users everywhere. + + To be able to use ticky-ticky profiling, you will need to + have built appropriate libraries and things when you made the + system. See “Customising what libraries to build,” in + the installation guide. + + To get your compiled program to spit out the ticky-ticky + numbers, use a RTS + option-r RTS option. + See . + + Compiling your program with the + switch yields an executable that performs these counts. Here is a + sample ticky-ticky statistics file, generated by the invocation + foo +RTS -rfoo.ticky. + + foo +RTS -rfoo.ticky @@ -983,30 +1181,33 @@ Total bytes copied during GC: 190096 0 GC_SEL_MAJOR_ctr 0 GC_FAILED_PROMOTION_ctr 47524 GC_WORDS_COPIED_ctr - - -
- - -The formatting of the information above the row of asterisks is -subject to change, but hopefully provides a useful human-readable -summary. Below the asterisks all counters maintained by the -ticky-ticky system are dumped, in a format intended to be -machine-readable: zero or more spaces, an integer, a space, the -counter name, and a newline. - - - -In fact, not all counters are necessarily dumped; compile- or -run-time flags can render certain counters invalid. In this case, -either the counter will simply not appear, or it will appear with a -modified counter name, possibly along with an explanation for the -omission (notice ENT_PERM_IND_ctr appears with an inserted ! -above). Software analysing this output should always check that it -has the counters it expects. Also, beware: some of the counters can -have large values! - - -
- -
+ + + The formatting of the information above the row of asterisks + is subject to change, but hopefully provides a useful + human-readable summary. Below the asterisks all + counters maintained by the ticky-ticky system are + dumped, in a format intended to be machine-readable: zero or more + spaces, an integer, a space, the counter name, and a newline. + + In fact, not all counters are + necessarily dumped; compile- or run-time flags can render certain + counters invalid. In this case, either the counter will simply + not appear, or it will appear with a modified counter name, + possibly along with an explanation for the omission (notice + ENT_PERM_IND_ctr appears + with an inserted ! above). Software analysing + this output should always check that it has the counters it + expects. Also, beware: some of the counters can have + large values! + + + + + +