X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fusers_guide%2Fprofiling.sgml;h=25ea4bca1ff35e55570e7eaed0b316ba7b29b2d7;hb=2dfd507259664e6f28df4a9467a8de34d01d70a0;hp=e79e63824af5a1fb52504c45f3c8e6e8c43fa3f8;hpb=dc801dc275fb8f81d482535b4d6317e234bb10f8;p=ghc-hetmet.git diff --git a/ghc/docs/users_guide/profiling.sgml b/ghc/docs/users_guide/profiling.sgml index e79e638..25ea4bc 100644 --- a/ghc/docs/users_guide/profiling.sgml +++ b/ghc/docs/users_guide/profiling.sgml @@ -1,5 +1,5 @@ - Profiling + Profiling profiling cost-centre profiling @@ -20,7 +20,7 @@ -prof option, and probably one of the -auto or -auto-all options. These options are described in more detail in + linkend="prof-compiler-options"/> -prof -auto @@ -61,7 +61,7 @@ main = print (nfib 25) -nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) Compile and run this program as follows: @@ -137,7 +137,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 main = print (f 25 + g 25) f n = nfib n g n = nfib (n `div` 2) -nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) +nfib n = if n < 2 then 1 else nfib (n-1) + nfib (n-2) Compile and run this program as before, and take a look at @@ -208,7 +208,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 - In addition you can use the RTS option + In addition you can use the RTS option to get the following additional information: @@ -225,13 +225,13 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 bytes - + Number of bytes allocated in the heap while in this cost-centre; again, this is the raw number from which we get the %alloc figure mentioned above. - - + + What about recursive functions, and mutually recursive @@ -240,7 +240,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 called each other recursively, this information isn't displayed in the basic time and allocation profile, instead the call-graph is flattened into a tree. The XML profiling tool (described in ) will be able to display real loops in + linkend="prof-xml-tool"/>) will be able to display real loops in the call-graph. Inserting cost centres by hand @@ -294,7 +294,7 @@ MAIN MAIN 0 0.0 0.0 100.0 100.0 - Time spent in foreign code (see ) + Time spent in foreign code (see ) is always attributed to the cost centre in force at the Haskell call-site of the foreign function. @@ -342,11 +342,11 @@ x = nfib 25 profilingoptions optionsfor profiling - - - : + + + : - + To make use of the profiling system all modules must be compiled and linked with the option. Any @@ -357,8 +357,8 @@ x = nfib 25 SCCs are ignored; so you can compile SCC-laden code without changing it. - - + + There are a few other profiling-related compilation options. @@ -368,73 +368,73 @@ x = nfib 25 - : + : cost centresautomatically inserting - + GHC will automatically add - _scc_ constructs for all + _scc_ constructs for all top-level, exported functions. - - + + - - : + + : - - All top-level functions, + + All top-level functions, exported or not, will be automatically - _scc_'d. - - + _scc_'d. + + - - : + + : - + The costs of all CAFs in a module are usually attributed to one “big” CAF cost-centre. With this option, all CAFs get their own cost-centre. An “if all else fails” option… - - + + - - : + + : - - Ignore any _scc_ + + Ignore any _scc_ constructs, so a module which already has - _scc_s can be compiled + _scc_s can be compiled for profiling with the annotations ignored. - - + + - + - Time and allocation profiling + Time and allocation profiling To generate a time and allocation profile, give one of the following RTS options to the compiled program when you run it (RTS options should be enclosed between +RTS...-RTS as usual): - - - or : + + + or : time profile - - The option produces a standard - time profile report. It is written + + The option produces a standard + time profile report. It is written into the file - program.prof. + program.prof. - The option produces a more + The option produces a more detailed report containing the actual time and allocation data as well. (Not used much.) @@ -446,7 +446,7 @@ x = nfib 25 The option generates profiling information in the XML format understood by our new - profiling tool, see . + profiling tool, see . @@ -458,11 +458,11 @@ x = nfib 25 This option makes use of the extra information maintained by the cost-centre-stack profiler to provide useful information about the location of runtime errors. - See . + See . - + @@ -482,7 +482,7 @@ x = nfib 25 Compile the program for profiling (). + linkend="prof-compiler-options"/>). Run it with one of the heap profiling options described @@ -495,7 +495,7 @@ x = nfib 25 file, prog.ps. The hp2ps utility is described in detail in - . + . Display the heap profile using a postscript viewer such @@ -565,7 +565,7 @@ x = nfib 25 Break down the graph by retainer set. Retainer profiling is described in more - detail below (). + detail below (). @@ -577,7 +577,7 @@ x = nfib 25 Break down the graph by biography. Biographical profiling is described in more detail below (). + linkend="biography-prof"/>). @@ -686,16 +686,16 @@ x = nfib 25 - : + : - + Set the profiling (sampling) interval to secs seconds (the default is 0.1 second). Fractions are allowed: for example - will get 5 samples per second. + will get 5 samples per second. This only affects heap profiling; time profiles are always sampled on a 1/50 second frequency. - + @@ -931,7 +931,7 @@ hp2ps [flags] [<file>[.hp]] The program hp2pshp2ps program converts a heap profile as produced - by the runtime option into a + by the runtime option into a PostScript graph of the heap profile. By convention, the file to be processed by hp2ps has a .hp extension. The PostScript output is @@ -946,36 +946,36 @@ hp2ps [flags] [<file>[.hp]] The flags are: - + - - - + + + In order to make graphs more readable, hp2ps sorts the shaded bands for each identifier. The default sort ordering is for the bands with the largest area to be stacked on top of the smaller ones. - The option causes rougher bands (those + The option causes rougher bands (those representing series of values with the largest standard deviations) to be stacked on top of smoother ones. - - + + - - - + + + Normally, hp2ps puts the title of the graph in a small box at the top of the page. However, if the JOB string is too long to fit in a small box (more than 35 characters), then hp2ps will choose to - use a big box instead. The option + use a big box instead. The option forces hp2ps to use a big box. - - + + - - - + + + Generate encapsulated PostScript suitable for inclusion in LaTeX documents. Usually, the PostScript graph is drawn in landscape mode in an area 9 inches wide by 6 @@ -983,112 +983,112 @@ hp2ps [flags] [<file>[.hp]] area to be approximately centred on a sheet of a4 paper. This format is convenient of studying the graph in detail, but it is unsuitable for inclusion in LaTeX documents. The - option causes the graph to be drawn in + option causes the graph to be drawn in portrait mode, with float specifying the width in inches, millimetres or points (the default). The resulting PostScript file conforms to the Encapsulated PostScript (EPS) convention, and it can be included in a LaTeX document using Rokicki's dvi-to-PostScript converter dvips. - - + + - - - + + + Create output suitable for the gs PostScript previewer (or similar). In this case the graph is printed in portrait mode without scaling. The output is unsuitable for a laser printer. - - + + - - - + + + Normally a profile is limited to 20 bands with additional identifiers being grouped into an - OTHER band. The flag + OTHER band. The flag removes this 20 band and limit, producing as many bands as necessary. No key is produced as it won't fit!. It is useful for creation time profiles with many bands. - - + + - - - + + + Normally a profile is limited to 20 bands with additional identifiers being grouped into an - OTHER band. The flag + OTHER band. The flag specifies an alternative band limit (the maximum is 20). - requests the band limit to be + requests the band limit to be removed. As many bands as necessary are produced. However no key is produced as it won't fit! It is useful for displaying creation time profiles with many bands. - - + + - - - + + + Use previous parameters. By default, the PostScript graph is automatically scaled both horizontally and vertically so that it fills the page. However, when preparing a series of graphs for use in a presentation, it is often useful to draw a new graph using the same scale, shading and ordering as a previous one. The - flag causes the graph to be drawn using + flag causes the graph to be drawn using the parameters determined by a previous run of hp2ps on file. These are extracted from file@.aux. - - + + - - - + + + Use a small box for the title. - - + + - - - + + + Normally trace elements which sum to a total of less than 1% of the profile are removed from the profile. The option allows this percentage to be modified (maximum 5%). - requests no trace elements to be + requests no trace elements to be removed from the profile, ensuring that all the data will be displayed. - - + + - - - + + + Generate colour output. - - + + - - - + + + Ignore marks. - - + + - - - + + + Print out usage information. - - - + + + @@ -1186,7 +1186,7 @@ This can all be encapsulated in a little script: #!/bin/sh head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ | hp2ps > FOO.ps - gv -watch -seascape FOO.ps & + gv -watch -seascape FOO.ps & while [ 1 ] ; do sleep 10 # We generate a new profile every 10 seconds. head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ @@ -1202,7 +1202,7 @@ to re-read its input file: #!/bin/sh head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \ | hp2ps > FOO.ps - gv FOO.ps & + gv FOO.ps & gvpsnum=$! while [ 1 ] ; do sleep 10 @@ -1218,7 +1218,7 @@ to re-read its input file: - Using “ticky-ticky” profiling (for implementors) + Using “ticky-ticky” profiling (for implementors) ticky-ticky profiling (ToDo: document properly.) @@ -1231,7 +1231,7 @@ to re-read its input file: profiling profiling, ticky-ticky because that's the sound a Sun4 makes when it is running up all those counters - (slowly). + (slowly). Ticky-ticky profiling is mainly intended for implementors; it is quite separate from the main “cost-centre” @@ -1243,11 +1243,11 @@ to re-read its input file: the installation guide. To get your compiled program to spit out the ticky-ticky - numbers, use a RTS + numbers, use a RTS option-r RTS option. - See . + See . - Compiling your program with the + Compiling your program with the switch yields an executable that performs these counts. Here is a sample ticky-ticky statistics file, generated by the invocation foo +RTS -rfoo.ticky. @@ -1336,12 +1336,12 @@ Total bytes copied during GC: 190096 The formatting of the information above the row of asterisks is subject to change, but hopefully provides a useful - human-readable summary. Below the asterisks all - counters maintained by the ticky-ticky system are + human-readable summary. Below the asterisks all + counters maintained by the ticky-ticky system are dumped, in a format intended to be machine-readable: zero or more spaces, an integer, a space, the counter name, and a newline. - In fact, not all counters are + In fact, not all counters are necessarily dumped; compile- or run-time flags can render certain counters invalid. In this case, either the counter will simply not appear, or it will appear with a modified counter name, @@ -1350,7 +1350,7 @@ Total bytes copied during GC: 190096 with an inserted ! above). Software analysing this output should always check that it has the counters it expects. Also, beware: some of the counters can have - large values! + large values!