From e01afc7ed018e28dbb0e9447714e630acbd0a9cc Mon Sep 17 00:00:00 2001 From: simonmar Date: Wed, 19 Dec 2001 15:51:49 +0000 Subject: [PATCH] [project @ 2001-12-19 15:51:49 by simonmar] Documentation update to describe the new retainer profiling and biographical profiling features. --- ghc/docs/users_guide/profiling.sgml | 778 ++++++++++++++++++++++------------- 1 file changed, 486 insertions(+), 292 deletions(-) diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml index e89a498..b7e3fd3 100644 --- a/ghc/docs/users_guide/profiling.sgml +++ b/ghc/docs/users_guide/profiling.sgml @@ -4,13 +4,13 @@ cost-centre profiling - Glasgow Haskell comes with a time and space profiling + Glasgow Haskell comes with a time and space profiling system. Its purpose is to help you improve your understanding of - your program's execution behaviour, so you can improve it. + your program's execution behaviour, so you can improve it. - Any comments, suggestions and/or improvements you have are + Any comments, suggestions and/or improvements you have are welcome. Recommended “profiling tricks” would be - especially cool! + especially cool! Profiling a program is a three-step process: @@ -30,12 +30,10 @@ - Run your program with one of the profiling options - -p or -h. This generates - a file of profiling information. - -pRTS - option - -hRTS + Run your program with one of the profiling options, eg. + +RTS -p -RTS. This generates a file of + profiling information. + RTS option @@ -94,7 +92,7 @@ nfib Main 100.0 100.0 individual inherited -COST CENTRE MODULE scc %time %alloc %time %alloc +COST CENTRE MODULE entries %time %alloc %time %alloc MAIN MAIN 0 0.0 0.0 100.0 100.0 main Main 0 0.0 0.0 0.0 0.0 @@ -218,20 +216,20 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 ticks - The raw number of time “ticks” which were + The raw number of time “ticks” which were attributed to this cost-centre; from this, we get the %time figure mentioned - above. + above. bytes - Number of bytes allocated in the heap while in this + Number of bytes allocated in the heap while in this cost-centre; again, this is the raw number from which we get the %alloc figure mentioned - above. + above. @@ -333,139 +331,77 @@ x = nfib 25 - - Profiling memory usage - - In addition to profiling the time and allocation behaviour - of your program, you can also generate a graph of its memory usage - over time. This is useful for detecting the causes of - space leaks, when your program holds on to - more memory at run-time that it needs to. Space leaks lead to - longer run-times due to heavy garbage collector ativity, and may - even cause the program to run out of memory altogether. - - To generate a heap profile from your program, compile it as - before, but this time run it with the runtime - option. This generates a file - <prog>.hp file, which you then process - with hp2ps to produce a Postscript file - <prog>.ps. The Postscript file can be - viewed with something like ghostview, or - printed out on a Postscript-compatible printer. - - For the RTS options that control the kind of heap profile - generated, see . Details on the - usage of the hp2ps program are given in - - - - - Graphical time/allocation profile - - You can view the time and allocation profiling graph of your - program graphically, using ghcprof. This is a - new tool with GHC 4.08, and will eventually be the de-facto - standard way of viewing GHC profiles. - - To run ghcprof, you need - daVinci installed, which can be - obtained from The Graph - Visualisation Tool daVinci. Install one of - the binary - distributionsdaVinci is - sadly not open-source :-(., and set your - DAVINCIHOME environment variable to point to the - installation directory. - - ghcprof uses an XML-based profiling log - format, and you therefore need to run your program with a - different option: . The file generated is - still called <prog>.prof. To see the - profile, run ghcprof like this: - - - - -$ ghcprof <prog>.prof - - - which should pop up a window showing the call-graph of your - program in glorious detail. More information on using - ghcprof can be found at The - Cost-Centre Stack Profiling Tool for - GHC. - - - Compiler options for profiling profilingoptions optionsfor profiling - To make use of the cost centre profiling system - all modules must be compiled and linked with - the option. Any - _scc_ constructs you've put in - your source will spring to life. - - -prof - - Without a option, your - _scc_s are ignored; so you can - compiled _scc_-laden code - without changing it. - - There are a few other profiling-related compilation options. - Use them in addition to - . These do not have to be used consistently - for all modules in a program. - - - : - -auto + : + + + To make use of the profiling system + all modules must be compiled and linked + with the option. Any + SCC annotations you've put in your source + will spring to life. + + Without a option, your + SCCs are ignored; so you can compile + SCC-laden code without changing + it. + + + + + There are a few other profiling-related compilation options. + Use them in addition to + . These do not have to be used consistently + for all modules in a program. + + + + : + cost centresautomatically inserting - GHC will automatically add + GHC will automatically add _scc_ constructs for all - top-level, exported functions. + top-level, exported functions. - : - -auto-all + : + - All top-level functions, + All top-level functions, exported or not, will be automatically - _scc_'d. + _scc_'d. - : - -caf-all + : + - The costs of all CAFs in a module are usually + The costs of all CAFs in a module are usually attributed to one “big” CAF cost-centre. With this option, all CAFs get their own cost-centre. An - “if all else fails” option… + “if all else fails” option… - : - -ignore-scc + : + - Ignore any _scc_ + Ignore any _scc_ constructs, so a module which already has _scc_s can be compiled - for profiling with the annotations ignored. + for profiling with the annotations ignored. @@ -473,47 +409,29 @@ $ ghcprof <prog>.prof - - Runtime options for profiling - - profiling RTS options - RTS options, for profiling + + Time and allocation profiling - It isn't enough to compile your program for profiling with - ! - - When you run your profiled program, you - must tell the runtime system (RTS) what you want to profile (e.g., - time and/or space), and how you wish the collected data to be - reported. You also may wish to set the sampling interval used in - time profiling. - - Executive summary: ./a.out +RTS -pT - produces a time profile in a.out.prof; - ./a.out +RTS -hC produces space-profiling info - which can be mangled by hp2ps and viewed with - ghostview (or equivalent). - - Profiling runtime flags are passed to your program between - the usual and - options. + To generate a time and allocation profile, give one of the + following RTS options to the compiled program when you run it (RTS + options should be enclosed between +RTS...-RTS + as usual): - or : time profile - The option produces a standard + The option produces a standard time profile report. It is written into the file - <program>.prof. + program.prof. - The option produces a more + The option produces a more detailed report containing the actual time and allocation - data as well. (Not used much.) + data as well. (Not used much.) @@ -528,153 +446,6 @@ $ ghcprof <prog>.prof - : - - - Set the profiling (sampling) interval to - <secs> seconds (the default is - 1 second). Fractions are allowed: for example - will get 5 samples per second. This - only affects heap profiling; time profiles are always - sampled on a 1/50 second frequency. - - - - - : - - heap profile - - Produce a detailed heap profile - of the heap occupied by live closures. The profile is - written to the file <program>.hp - from which a PostScript graph can be produced using - hp2ps (see ). - - The heap space profile may be broken down by different - criteria: - - - - - : - - cost centre which produced the closure (the - default). - - - - - : - - cost centre module which produced the - closure. - - - - - : - - closure description—a string describing - the closure. - - - - - : - - closure type—a string describing the - closure's type. - - - - - - - - - : - - heap profile filtering options - - It's often useful to select just some subset of the - heap when profiling. To do this, the following filters are - available. You may use multiple filters, in which case a - closure has to satisfy all filters to appear in the final - profile. Filtering criterion are independent of what it is - you ask to see. So, for example, you can specify a profile - by closure description (-hD) but ask to - filter closures by producer module (-hm{...}). - - - Available filters are: - - - - - : - - Restrict to one of the specified cost centers. - Since GHC deals in cost center stacks, the specified - cost centers pertain to the top stack element. For - example, -hc{Wurble,Burble} selects - all cost center stacks whose top element is - Wurble or - Burble. - - - - - - : - - Restrict to closures produced by functions in - one of the specified modules. - - - - - - : - - Restrict to closures whose description-string is - one of the specified descriptions. Description - strings are pretty arcane. An easy way to find - plausible strings to specify is to first do a - -hD profile and then inspect the - description-strings which appear in the resulting profile. - - - - - - : - - Restrict to closures having one of the specified - types. - - - - - - - - - - : - - - The option generates heap - profiling information in the XML format understood by our - new profiling tool (NOTE: heap profiling with the new tool - is not yet working! Use hp2ps-style heap - profiling for the time being). - - - - RTS option @@ -690,6 +461,429 @@ $ ghcprof <prog>.prof + + Profiling memory usage + + In addition to profiling the time and allocation behaviour + of your program, you can also generate a graph of its memory usage + over time. This is useful for detecting the causes of + space leaks, when your program holds on to + more memory at run-time that it needs to. Space leaks lead to + longer run-times due to heavy garbage collector ativity, and may + even cause the program to run out of memory altogether. + + To generate a heap profile from your program: + + + + Compile the program for profiling (). + + + Run it with one of the heap profiling options described + below (eg. for a basic producer profile). + This generates the file + prog.hp. + + + Run hp2ps to produce a Postscript + file, + prog.ps. The + hp2ps utility is described in detail in + . + + + Display the heap profile using a postscript viewer such + as Ghostview, or print it out on a + Postscript-capable printer. + + + + + RTS options for heap profiling + + There are several different kinds of heap profile that can + be generated. All the different profile types yield a graph of + live heap against time, but they differ in how the live heap is + broken down into bands. The following RTS options select which + break-down to use: + + + + + RTS + option + + Breaks down the graph by the cost-centre stack which + produced the data. + + + + + + RTS + option + + Break down the live heap by the module containing + the code which produced the data. + + + + + + RTS + option + + Breaks down the graph by closure + description. For actual data, the description + is just the constructor name, for other closures it is a + compiler-generated string identifying the closure. + + + + + + RTS + option + + Breaks down the graph by + type. For closures which have + function type or unknown/polymorphic type, the string will + represent an approximation to the actual type. + + + + + + RTS + option + + Break down the graph by retainer + set. Retainer profiling is described in more + detail below (). + + + + + + RTS + option + + Break down the graph by + biography. Biographical profiling + is described in more detail below (). + + + + + In addition, the profile can be restricted to heap data + which satisfies certain criteria - for example, you might want + to display a profile by type but only for data produced by a + certain module, or a profile by retainer for a certain type of + data. Restrictions are specified as follows: + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + at the top. + + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + anywhere in the stack. + + + + + module,... + RTS + option + + Restrict the profile to closures produced by the + specified modules. + + + + + desc,... + RTS + option + + Restrict the profile to closures with the specified + description strings. + + + + + type,... + RTS + option + + Restrict the profile to closures with the specified + types. + + + + + cc,... + RTS + option + + Restrict the profile to closures with retainer sets + containing cost-centre stacks with one of the specified + cost centres at the top. + + + + + bio,... + RTS + option + + Restrict the profile to closures with one of the + specified biographies, where + bio is one of + lag, drag, + void, or use. + + + + + For example, the following options will generate a + retainer profile restricted to Branch and + Leaf constructors: + + +prog +RTS -hr -hdBranch,Leaf + + + There can only be one "break-down" option + (eg. in the example above), but there is no + limit on the number of further restrictions that may be applied. + All the options may be combined, with one exception: GHC doesn't + currently support mixing the and + options. + + There's one more option which relates to heap + profiling: + + + + : + + + Set the profiling (sampling) interval to + secs seconds (the default is + 0.1 second). Fractions are allowed: for example + will get 5 samples per second. + This only affects heap profiling; time profiles are always + sampled on a 1/50 second frequency. + + + + + + + + Retainer Profiling + + Retainer profiling is designed to help answer questions + like why is this data being retained?. We start + by defining what we mean by a retainer: + +
+ A retainer is either the system stack, or an unevaluated + closure (thunk). +
+ + In particular, constructors are not + retainers. + + An object A is retained by an object B if object A can be + reached by recursively following pointers starting from object + B but not meeting any other retainers on the way. Each object + has one or more retainers, collectively called its + retainer set. + + When retainer profiling is requested by giving the program + the option, a graph is generated which is + broken down by retainer set. A retainer set is displayed as a + set of cost-centre stacks; because this is usually too large to + fit on the profile graph, each retainer set is numbered and + shown abbreviated on the graph along with its number, and the + full list of retainer sets is dumped into the file + prog.prof. + + Retainer profiling requires multiple passes over the live + heap in order to discover the full retainer set for each + object, which can be quite slow. So we set a limit on the + maximum size of a retainer set, where all retainer sets larger + than the maximum retainer set size are replaced by the special + set MANY. The maximum set size defaults to 8 + and can be altered with the RTS + option: + + + + size + + Restrict the number of elements in a retainer set to + size (default 8). + + + + + + Hints for using retainer profiling + + The definition of retainers is designed to reflect a + common cause of space leaks: a large structure is retained by + an unevaluated computation, and will be released once the + compuation is forced. A good example is looking up a value in + a finite map, where unless the lookup is forced in a timely + manner the unevaluated lookup will cause the whole mapping to + be retained. These kind of space leaks can often be + eliminated by forcing the relevant computations to be + performed eagerly, using seq or strictness + annotations on data constructor fields. + + Often a particular data structure is being retained by a + chain of unevaluated closures, only the nearest of which will + be reported by retainer profiling - for example A retains B, B + retains C, and C retains a large structure. There might be a + large number of Bs but only a single A, so A is really the one + we're interested in eliminating. However, retainer profiling + will in this case report B as the retainer of the large + structure. To move further up the chain of retainers, we can + ask for another retainer profile but this time restrict the + profile to B objects, so we get a profile of the retainers of + B: + + +prog +RTS -hr -hcB + + + This trick isn't foolproof, because there might be other + B closures in the heap which aren't the retainers we are + interested in, but we've found this to be a useful technique + in most cases. + +
+ + + Biographical Profiling + + A typical heap object may be in one of the following four + states at each point in its lifetime: + + + + The lag stage, which is the + time between creation and the first use of the + object, + + + the use stage, which lasts from + the first use until the last use of the object, and + + + The drag stage, which lasts + from the final use until the last reference to the object + is dropped. + + + An object which is never used is said to be in the + void state for its whole + lifetime. + + + + A biographical heap profile displays the portion of the + live heap in each of the four states listed above. Usually the + most interesting states are the void and drag states: live heap + in these states is more likely to be wasted space than heap in + the lag or use states. + + It is also possible to break down the heap in one or more + of these states by a different criteria, by restricting a + profile by biography. For example, to show the portion of the + heap in the drag or void state by producer: + + +prog +RTS -hc -hbdrag,void + + + Once you know the producer or the type of the heap in the + drag or void states, the next step is usually to find the + retainer(s): + + +prog +RTS -hr -hccc... + + + NOTE: this two stage process is required because GHC + cannot currently profile using both biographical and retainer + information simultaneously. + + +
+ + + Graphical time/allocation profile + + You can view the time and allocation profiling graph of your + program graphically, using ghcprof. This is a + new tool with GHC 4.08, and will eventually be the de-facto + standard way of viewing GHC profilesActually this + isn't true any more, we are working on a new tool for + displaying heap profiles using Gtk+HS, so + ghcprof may go away at some point in the future. + + + To run ghcprof, you need + daVinci installed, which can be + obtained from The Graph + Visualisation Tool daVinci. Install one of + the binary + distributionsdaVinci is + sadly not open-source :-(., and set your + DAVINCIHOME environment variable to point to the + installation directory. + + ghcprof uses an XML-based profiling log + format, and you therefore need to run your program with a + different option: . The file generated is + still called <prog>.prof. To see the + profile, run ghcprof like this: + + + + +$ ghcprof <prog>.prof + + + which should pop up a window showing the call-graph of your + program in glorious detail. More information on using + ghcprof can be found at The + Cost-Centre Stack Profiling Tool for + GHC. + + + <command>hp2ps</command>--heap profile to PostScript -- 1.7.10.4