X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fusers_guide%2Fprofiling.sgml;h=e79e63824af5a1fb52504c45f3c8e6e8c43fa3f8;hb=57d3c7c8a8523e375d49d2ac036210b6dcf7ec9c;hp=30175aed34d08c7315145a30fe1a80a0630f1376;hpb=66e87ae1ac00d54df5024033fda5d08db99177a4;p=ghc-hetmet.git diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml index 30175ae..e79e638 100644 --- a/ghc/docs/users_guide/profiling.sgml +++ b/ghc/docs/users_guide/profiling.sgml @@ -4,13 +4,13 @@ cost-centre profiling - Glasgow Haskell comes with a time and space profiling + Glasgow Haskell comes with a time and space profiling system. Its purpose is to help you improve your understanding of - your program's execution behaviour, so you can improve it. + your program's execution behaviour, so you can improve it. - Any comments, suggestions and/or improvements you have are + Any comments, suggestions and/or improvements you have are welcome. Recommended “profiling tricks” would be - especially cool! + especially cool! Profiling a program is a three-step process: @@ -30,12 +30,10 @@ - Run your program with one of the profiling options - -p or -h. This generates - a file of profiling information. - -pRTS - option - -hRTS + Run your program with one of the profiling options, eg. + +RTS -p -RTS. This generates a file of + profiling information. + RTS option @@ -47,7 +45,7 @@ - + Cost centres and cost-centre stacks GHC's profiling system assigns costs @@ -81,7 +79,7 @@ $ will contain something like this: - Tue Apr 18 12:52 2000 Time and Allocation Profiling Report (Final) + Fri May 12 14:06 2000 Time and Allocation Profiling Report (Final) Main +RTS -p -RTS @@ -93,15 +91,16 @@ COST CENTRE MODULE %time %alloc nfib Main 100.0 100.0 -COST CENTRE MODULE scc %time %alloc inner cafs + individual inherited +COST CENTRE MODULE entries %time %alloc %time %alloc -MAIN MAIN 0 0.0 0.0 0 1 - main Main 0 0.0 0.0 0 1 - CAF PrelHandle 3 0.0 0.0 0 3 - CAF PrelAddr 1 0.0 0.0 0 0 - CAF Main 6 0.0 0.0 1 0 - main Main 1 0.0 0.0 1 1 - nfib Main 242785 100.0 100.0 242784 4 +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 6 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + nfib Main 242785 100.0 100.0 100.0 100.0 @@ -125,6 +124,12 @@ MAIN MAIN 0 0.0 0.0 0 1 the costly call to nfib came from main. + The time and allocation incurred by a given part of the + program is displayed in two ways: “individual”, which + are the costs incurred by the code covered by this cost centre + stack alone, and “inherited”, which includes the costs + incurred by all the children of this node. + The usefulness of cost-centre stacks is better demonstrated by modifying the example slightly: @@ -139,18 +144,18 @@ nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) the new profiling results: -COST CENTRE MODULE scc %time %alloc inner cafs - -MAIN MAIN 0 0.0 0.0 0 1 - main Main 0 0.0 0.0 0 1 - CAF PrelHandle 3 0.0 0.0 0 3 - CAF PrelAddr 1 0.0 0.0 0 0 - CAF Main 9 0.0 0.0 1 1 - main Main 1 0.0 0.0 2 2 - g Main 1 0.0 0.0 1 3 - nfib Main 465 0.0 0.2 464 0 - f Main 1 0.0 0.0 1 1 - nfib Main 242785 100.0 99.8 242784 1 +COST CENTRE MODULE scc %time %alloc %time %alloc + +MAIN MAIN 0 0.0 0.0 100.0 100.0 + main Main 0 0.0 0.0 0.0 0.0 + CAF PrelHandle 3 0.0 0.0 0.0 0.0 + CAF PrelAddr 1 0.0 0.0 0.0 0.0 + CAF Main 9 0.0 0.0 100.0 100.0 + main Main 1 0.0 0.0 100.0 100.0 + g Main 1 0.0 0.0 0.0 0.2 + nfib Main 465 0.0 0.2 0.0 0.2 + f Main 1 0.0 0.0 100.0 99.8 + nfib Main 242785 100.0 99.8 100.0 99.8 Now although we had two calls to nfib @@ -161,7 +166,7 @@ MAIN MAIN 0 0.0 0.0 0 1 - scc + entries The number of times this particular point in the call graph was entered. @@ -169,7 +174,7 @@ MAIN MAIN 0 0.0 0.0 0 1 - %time + individual %time The percentage of the total run time of the program spent at this point in the call graph. @@ -177,7 +182,7 @@ MAIN MAIN 0 0.0 0.0 0 1 - %alloc + individual %alloc The percentage of the total memory allocations (excluding profiling overheads) of the program made by this @@ -186,19 +191,19 @@ MAIN MAIN 0 0.0 0.0 0 1 - inner + inherited %time - The number of times an inner call-graph context was - entered from here (including recursive calls). + The percentage of the total run time of the program + spent below this point in the call graph. - cafs + inherited %alloc - The number of times a CAF context was entered from - here. CAFs are described in . + The percentage of the total memory allocations + (excluding profiling overheads) of the program made by this + call and all of its sub-calls. @@ -211,20 +216,20 @@ MAIN MAIN 0 0.0 0.0 0 1 ticks - The raw number of time “ticks” which were + The raw number of time “ticks” which were attributed to this cost-centre; from this, we get the %time figure mentioned - above. + above. bytes - Number of bytes allocated in the heap while in this + Number of bytes allocated in the heap while in this cost-centre; again, this is the raw number from which we get the %alloc figure mentioned - above. + above. @@ -249,14 +254,14 @@ MAIN MAIN 0 0.0 0.0 0 1 The syntax of a cost centre annotation is - _scc_ "name" <expression> + {-# SCC "name" #-} <expression> where "name" is an aribrary string, that will become the name of your cost centre as it appears in the profiling output, and <expression> is any Haskell - expression. An _scc_ annotation extends as + expression. An SCC annotation extends as far to the right as possible when parsing. @@ -272,14 +277,14 @@ MAIN MAIN 0 0.0 0.0 0 1 If the expression is part of the one-off costs of evaluating the enclosing top-level definition, then costs are attributed to - the stack of lexically enclosing _scc_ + the stack of lexically enclosing SCC annotations on top of the special CAF cost-centre. Otherwise, costs are attributed to the stack of - lexically-enclosing _scc_ annotations, + lexically-enclosing SCC annotations, appended to the cost-centre stack in effect at the call site of the current top-level definition The call-site is just the place @@ -287,6 +292,12 @@ MAIN MAIN 0 0.0 0.0 0 1 variable.. Notice that this is a recursive definition. + + + Time spent in foreign code (see ) + is always attributed to the cost centre in force at the + Haskell call-site of the foreign function. + What do we mean by one-off costs? Well, Haskell is a lazy @@ -322,143 +333,80 @@ x = nfib 25 doesn't look like you expect it to, feel free to send it (and your program) to us at glasgow-haskell-bugs@haskell.org. - - - Profiling memory usage - - In addition to profiling the time and allocation behaviour - of your program, you can also generate a graph of its memory usage - over time. This is useful for detecting the causes of - space leaks, when your program holds on to - more memory at run-time that it needs to. Space leaks lead to - longer run-times due to heavy garbage collector ativity, and may - even cause the program to run out of memory altogether. - - To generate a heap profile from your program, compile it as - before, but this time run it with the runtime - option. This generates a file - <prog>.hp file, which you then process - with hp2ps to produce a Postscript file - <prog>.ps. The Postscript file can be - viewed with something like ghostview, or - printed out on a Postscript-compatible printer. - - For the RTS options that control the kind of heap profile - generated, see . Details on the - usage of the hp2ps program are given in - - - - - Graphical time/allocation profile - - You can view the time and allocation profiling graph of your - program graphically, using ghcprof. This is a - new tool with GHC 4.07, and will eventually be the de-facto - standard way of viewing GHC profiles. - - To run ghcprof, you need - daVinci installed, which can be - obtained from The Graph - Visualisation Tool daVinci. Install one of - the binary - distributionsdaVinci is - sadly not open-source :-(., and set your - DAVINCIHOME environment variable to point to the - installation directory. - - ghcprof uses an XML-based profiling log - format, and you therefore need to run your program with a - different option: . The file generated is - still called <prog>.prof. To see the - profile, run ghcprof like this: - - - - -$ ghcprof <prog>.prof - - - which should pop up a window showing the call-graph of your - program in glorious detail. More information on using - ghcprof can be found at The - Cost-Centre Stack Profiling Tool for - GHC. - - - Compiler options for profiling profilingoptions optionsfor profiling - To make use of the cost centre profiling system - all modules must be compiled and linked with - the option. Any - _scc_ constructs you've put in - your source will spring to life. - - -prof - - Without a option, your - _scc_s are ignored; so you can - compiled _scc_-laden code - without changing it. - - There are a few other profiling-related compilation options. - Use them in addition to - . These do not have to be used consistently - for all modules in a program. - - - : - -auto + : + + + To make use of the profiling system + all modules must be compiled and linked + with the option. Any + SCC annotations you've put in your source + will spring to life. + + Without a option, your + SCCs are ignored; so you can compile + SCC-laden code without changing + it. + + + + + There are a few other profiling-related compilation options. + Use them in addition to + . These do not have to be used consistently + for all modules in a program. + + + + : + cost centresautomatically inserting - GHC will automatically add + GHC will automatically add _scc_ constructs for all - top-level, exported functions. + top-level, exported functions. - : - -auto-all + : + - All top-level functions, + All top-level functions, exported or not, will be automatically - _scc_'d. + _scc_'d. - : - -caf-all + : + - The costs of all CAFs in a module are usually + The costs of all CAFs in a module are usually attributed to one “big” CAF cost-centre. With this option, all CAFs get their own cost-centre. An - “if all else fails” option… + “if all else fails” option… - : - -ignore-scc + : + - Ignore any _scc_ + Ignore any _scc_ constructs, so a module which already has _scc_s can be compiled - for profiling with the annotations ignored. + for profiling with the annotations ignored. @@ -466,47 +414,29 @@ $ ghcprof <prog>.prof - - Runtime options for profiling - - profiling RTS options - RTS options, for profiling + + Time and allocation profiling - It isn't enough to compile your program for profiling with - ! - - When you run your profiled program, you - must tell the runtime system (RTS) what you want to profile (e.g., - time and/or space), and how you wish the collected data to be - reported. You also may wish to set the sampling interval used in - time profiling. - - Executive summary: ./a.out +RTS -pT - produces a time profile in a.out.prof; - ./a.out +RTS -hC produces space-profiling info - which can be mangled by hp2ps and viewed with - ghostview (or equivalent). - - Profiling runtime flags are passed to your program between - the usual and - options. + To generate a time and allocation profile, give one of the + following RTS options to the compiled program when you run it (RTS + options should be enclosed between +RTS...-RTS + as usual): - or : time profile - The option produces a standard + The option produces a standard time profile report. It is written into the file - <program>.prof. + program.prof. - The option produces a more + The option produces a more detailed report containing the actual time and allocation - data as well. (Not used much.) + data as well. (Not used much.) @@ -521,89 +451,471 @@ $ ghcprof <prog>.prof - : - - - Set the profiling (sampling) interval to - <secs> seconds (the default is - 1 second). Fractions are allowed: for example - will get 5 samples per second. This - only affects heap profiling; time profiles are always - sampled on a 1/50 second frequency. - + + RTS + option + + This option makes use of the extra information + maintained by the cost-centre-stack profiler to provide + useful information about the location of runtime errors. + See . + - - : - - heap profile - - Produce a detailed heap profile - of the heap occupied by live closures. The profile is - written to the file <program>.hp - from which a PostScript graph can be produced using - hp2ps (see ). - - The heap space profile may be broken down by different - criteria: - - - - - : - - cost centre which produced the closure (the - default). - - - - - : - - cost centre module which produced the - closure. - - - - - : - - closure description—a string describing - the closure. - - - - - : - - closure type—a string describing the - closure's type. - - - + + + - - + + Profiling memory usage - - : - + In addition to profiling the time and allocation behaviour + of your program, you can also generate a graph of its memory usage + over time. This is useful for detecting the causes of + space leaks, when your program holds on to + more memory at run-time that it needs to. Space leaks lead to + longer run-times due to heavy garbage collector ativity, and may + even cause the program to run out of memory altogether. + + To generate a heap profile from your program: + + + + Compile the program for profiling (). + + + Run it with one of the heap profiling options described + below (eg. for a basic producer profile). + This generates the file + prog.hp. + + + Run hp2ps to produce a Postscript + file, + prog.ps. The + hp2ps utility is described in detail in + . + + + Display the heap profile using a postscript viewer such + as Ghostview, or print it out on a + Postscript-capable printer. + + + + + RTS options for heap profiling + + There are several different kinds of heap profile that can + be generated. All the different profile types yield a graph of + live heap against time, but they differ in how the live heap is + broken down into bands. The following RTS options select which + break-down to use: + + + + + RTS + option + + Breaks down the graph by the cost-centre stack which + produced the data. + + + + + + RTS + option + + Break down the live heap by the module containing + the code which produced the data. + + + + + + RTS + option + + Breaks down the graph by closure + description. For actual data, the description + is just the constructor name, for other closures it is a + compiler-generated string identifying the closure. + + + + + + RTS + option + + Breaks down the graph by + type. For closures which have + function type or unknown/polymorphic type, the string will + represent an approximation to the actual type. + + + + + + RTS + option + + Break down the graph by retainer + set. Retainer profiling is described in more + detail below (). + + + + + + RTS + option + + Break down the graph by + biography. Biographical profiling + is described in more detail below (). + + + + + In addition, the profile can be restricted to heap data + which satisfies certain criteria - for example, you might want + to display a profile by type but only for data produced by a + certain module, or a profile by retainer for a certain type of + data. Restrictions are specified as follows: + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + at the top. + + + + + name,... + RTS + option + + Restrict the profile to closures produced by + cost-centre stacks with one of the specified cost centres + anywhere in the stack. + + + + + module,... + RTS + option + + Restrict the profile to closures produced by the + specified modules. + + + + + desc,... + RTS + option + + Restrict the profile to closures with the specified + description strings. + + + + + type,... + RTS + option + + Restrict the profile to closures with the specified + types. + + + + + cc,... + RTS + option + + Restrict the profile to closures with retainer sets + containing cost-centre stacks with one of the specified + cost centres at the top. + + + + + bio,... + RTS + option + + Restrict the profile to closures with one of the + specified biographies, where + bio is one of + lag, drag, + void, or use. + + + + + For example, the following options will generate a + retainer profile restricted to Branch and + Leaf constructors: + + +prog +RTS -hr -hdBranch,Leaf + + + There can only be one "break-down" option + (eg. in the example above), but there is no + limit on the number of further restrictions that may be applied. + All the options may be combined, with one exception: GHC doesn't + currently support mixing the and + options. + + There are two more options which relate to heap + profiling: + + + + : + + + Set the profiling (sampling) interval to + secs seconds (the default is + 0.1 second). Fractions are allowed: for example + will get 5 samples per second. + This only affects heap profiling; time profiles are always + sampled on a 1/50 second frequency. + + + + + + RTS option + + + Include the memory occupied by threads in a heap + profile. Each thread takes up a small area for its thread + state in addition to the space allocated for its stack + (stacks normally start small and then grow as + necessary). + + This includes the main thread, so using + is a good way to see how much stack + space the program is using. + + Memory occupied by threads and their stacks is + labelled as “TSO” when displaying the profile + by closure description or type description. + + + + + + + + Retainer Profiling + + Retainer profiling is designed to help answer questions + like why is this data being retained?. We start + by defining what we mean by a retainer: + +
+ A retainer is either the system stack, or an unevaluated + closure (thunk). +
+ + In particular, constructors are not + retainers. + + An object A is retained by an object B if object A can be + reached by recursively following pointers starting from object + B but not meeting any other retainers on the way. Each object + has one or more retainers, collectively called its + retainer set. + + When retainer profiling is requested by giving the program + the option, a graph is generated which is + broken down by retainer set. A retainer set is displayed as a + set of cost-centre stacks; because this is usually too large to + fit on the profile graph, each retainer set is numbered and + shown abbreviated on the graph along with its number, and the + full list of retainer sets is dumped into the file + prog.prof. + + Retainer profiling requires multiple passes over the live + heap in order to discover the full retainer set for each + object, which can be quite slow. So we set a limit on the + maximum size of a retainer set, where all retainer sets larger + than the maximum retainer set size are replaced by the special + set MANY. The maximum set size defaults to 8 + and can be altered with the RTS + option: + + + + size + + Restrict the number of elements in a retainer set to + size (default 8). + + + + + + Hints for using retainer profiling + + The definition of retainers is designed to reflect a + common cause of space leaks: a large structure is retained by + an unevaluated computation, and will be released once the + compuation is forced. A good example is looking up a value in + a finite map, where unless the lookup is forced in a timely + manner the unevaluated lookup will cause the whole mapping to + be retained. These kind of space leaks can often be + eliminated by forcing the relevant computations to be + performed eagerly, using seq or strictness + annotations on data constructor fields. + + Often a particular data structure is being retained by a + chain of unevaluated closures, only the nearest of which will + be reported by retainer profiling - for example A retains B, B + retains C, and C retains a large structure. There might be a + large number of Bs but only a single A, so A is really the one + we're interested in eliminating. However, retainer profiling + will in this case report B as the retainer of the large + structure. To move further up the chain of retainers, we can + ask for another retainer profile but this time restrict the + profile to B objects, so we get a profile of the retainers of + B: + + +prog +RTS -hr -hcB + + + This trick isn't foolproof, because there might be other + B closures in the heap which aren't the retainers we are + interested in, but we've found this to be a useful technique + in most cases. + +
+ + + Biographical Profiling + + A typical heap object may be in one of the following four + states at each point in its lifetime: + + - The option generates heap - profiling information in the XML format understood by our - new profiling tool (NOTE: heap profiling with the new tool - is not yet working! Use hp2ps-style heap - profiling for the time being). + The lag stage, which is the + time between creation and the first use of the + object, -
+ + the use stage, which lasts from + the first use until the last use of the object, and + + + The drag stage, which lasts + from the final use until the last reference to the object + is dropped. + + + An object which is never used is said to be in the + void state for its whole + lifetime. + + + + A biographical heap profile displays the portion of the + live heap in each of the four states listed above. Usually the + most interesting states are the void and drag states: live heap + in these states is more likely to be wasted space than heap in + the lag or use states. + + It is also possible to break down the heap in one or more + of these states by a different criteria, by restricting a + profile by biography. For example, to show the portion of the + heap in the drag or void state by producer: + + +prog +RTS -hc -hbdrag,void + + + Once you know the producer or the type of the heap in the + drag or void states, the next step is usually to find the + retainer(s): + + +prog +RTS -hr -hccc... + + + NOTE: this two stage process is required because GHC + cannot currently profile using both biographical and retainer + information simultaneously. + + + + + + +
+ + + Graphical time/allocation profile + + You can view the time and allocation profiling graph of your + program graphically, using ghcprof. This is a + new tool with GHC 4.08, and will eventually be the de-facto + standard way of viewing GHC profilesActually this + isn't true any more, we are working on a new tool for + displaying heap profiles using Gtk+HS, so + ghcprof may go away at some point in the future. + + + To run ghcprof, you need + daVinci installed, which can be + obtained from The Graph + Visualisation Tool daVinci. Install one of + the binary + distributionsdaVinci is + sadly not open-source :-(., and set your + DAVINCIHOME environment variable to point to the + installation directory. + + ghcprof uses an XML-based profiling log + format, and you therefore need to run your program with a + different option: . The file generated is + still called <prog>.prof. To see the + profile, run ghcprof like this: + + + + +$ ghcprof <prog>.prof + + + which should pop up a window showing the call-graph of your + program in glorious detail. More information on using + ghcprof can be found at The + Cost-Centre Stack Profiling Tool for + GHC. - - - <command>hp2ps</command>--heap profile to PostScript + <command>hp2ps</command>––heap profile to PostScript hp2ps heap profiles @@ -777,6 +1089,132 @@ hp2ps [flags] [<file>[.hp]] + + + + Manipulating the hp file + +(Notes kindly offered by Jan-Willhem Maessen.) + + +The FOO.hp file produced when you ask for the +heap profile of a program FOO is a text file with a particularly +simple structure. Here's a representative example, with much of the +actual data omitted: + +JOB "FOO -hC" +DATE "Thu Dec 26 18:17 2002" +SAMPLE_UNIT "seconds" +VALUE_UNIT "bytes" +BEGIN_SAMPLE 0.00 +END_SAMPLE 0.00 +BEGIN_SAMPLE 15.07 + ... sample data ... +END_SAMPLE 15.07 +BEGIN_SAMPLE 30.23 + ... sample data ... +END_SAMPLE 30.23 +... etc. +BEGIN_SAMPLE 11695.47 +END_SAMPLE 11695.47 + +The first four lines (JOB, DATE, SAMPLE_UNIT, VALUE_UNIT) form a +header. Each block of lines starting with BEGIN_SAMPLE and ending +with END_SAMPLE forms a single sample (you can think of this as a +vertical slice of your heap profile). The hp2ps utility should accept +any input with a properly-formatted header followed by a series of +*complete* samples. + + + + + Zooming in on regions of your profile + + +You can look at particular regions of your profile simply by loading a +copy of the .hp file into a text editor and deleting the unwanted +samples. The resulting .hp file can be run through hp2ps and viewed +or printed. + + + + + Viewing the heap profile of a running program + + +The .hp file is generated incrementally as your +program runs. In principle, running hp2ps on the incomplete file +should produce a snapshot of your program's heap usage. However, the +last sample in the file may be incomplete, causing hp2ps to fail. If +you are using a machine with UNIX utilities installed, it's not too +hard to work around this problem (though the resulting command line +looks rather Byzantine): + + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + + +The command fgrep -n END_SAMPLE FOO.hp finds the +end of every complete sample in FOO.hp, and labels each sample with +its ending line number. We then select the line number of the last +complete sample using tail and cut. This is used as a +parameter to head; the result is as if we deleted the final +incomplete sample from FOO.hp. This results in a properly-formatted +.hp file which we feed directly to hp2ps. + + + + Viewing a heap profile in real time + + +The gv and ghostview programs +have a "watch file" option can be used to view an up-to-date heap +profile of your program as it runs. Simply generate an incremental +heap profile as described in the previous section. Run gv on your +profile: + + gv -watch -seascape FOO.ps + +If you forget the -watch flag you can still select +"Watch file" from the "State" menu. Now each time you generate a new +profile FOO.ps the view will update automatically. + + + +This can all be encapsulated in a little script: + + #!/bin/sh + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + gv -watch -seascape FOO.ps & + while [ 1 ] ; do + sleep 10 # We generate a new profile every 10 seconds. + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + done + +Occasionally gv will choke as it tries to read an incomplete copy of +FOO.ps (because hp2ps is still running as an update +occurs). A slightly more complicated script works around this +problem, by using the fact that sending a SIGHUP to gv will cause it +to re-read its input file: + + #!/bin/sh + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + gv FOO.ps & + gvpsnum=$! + while [ 1 ] ; do + sleep 10 + head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ + | hp2ps > FOO.ps + kill -HUP $gvpsnum + done + + + + +