X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=docs%2Fusers_guide%2Fusing.xml;h=14665af22947370522ff031da166842eace65b32;hb=ea283aa74e6fd2bec2b88eae19908bba903adea1;hp=8cbcd35fca6cb6f6dcf01dbb8865bcaa9dc42ea1;hpb=0065d5ab628975892cea1ec7303f968c3338cbe1;p=ghc-hetmet.git diff --git a/docs/users_guide/using.xml b/docs/users_guide/using.xml index 8cbcd35..14665af 100644 --- a/docs/users_guide/using.xml +++ b/docs/users_guide/using.xml @@ -793,8 +793,7 @@ ghc -c Foo.hs Provides the standard warnings plus , , - , - , and + , and . @@ -929,19 +928,6 @@ f foo = foo { x = 6 } - : - - - - Turns on warnings for various harmless but untidy - things. This currently includes: importing a type with - (..) when the export is abstract, and - listing duplicate class assertions in a qualified type. - - - - - : missing fields, warning @@ -1383,6 +1369,16 @@ f "2" = 2 Turns off the full laziness optimisation (also known as let-floating). Full laziness increases sharing, which can lead to increased memory residency. + + NOTE: GHC doesn't implement complete full-laziness. + When optimisation in on, and + is not given, some + transformations that increase sharing are performed, such + as extracting repeated computations from a loop. These + are the same transformations that a fully lazy + implementation would do, the difference is that GHC + doesn't consistently apply full-laziness, so don't rely on + it. @@ -1514,362 +1510,92 @@ f "2" = 2 every 4k of allocation). With or , context switches will occur as often as possible (at every heap block allocation). By default, context - switches occur every 20ms. Note that GHC's internal timer ticks - every 20ms, and the context switch timer is always a multiple of - this timer, so 20ms is the maximum granularity available for timed - context switches. + switches occur every 20ms. - -Using parallel Haskell - - -Parallel Haskellusing -[NOTE: GHC does not support Parallel Haskell by default, you need to - obtain a special version of GHC from the GPH site. Also, -you won't be able to execute parallel Haskell programs unless PVM3 -(parallel Virtual Machine, version 3) is installed at your site.] - - - -To compile a Haskell program for parallel execution under PVM, use the - option,-parallel -option both when compiling and -linking. You will probably want to import -Control.Parallel into your Haskell modules. - - - -To run your parallel program, once PVM is going, just invoke it -“as normal”. The main extra RTS option is -, to say how many PVM -“processors” your program to run on. (For more details of -all relevant RTS options, please see .) - - - -In truth, running parallel Haskell programs and getting information -out of them (e.g., parallelism profiles) is a battle with the vagaries of -PVM, detailed in the following sections. - - - -Dummy's guide to using PVM - - -PVM, how to use -parallel Haskell—PVM use -Before you can run a parallel program under PVM, you must set the -required environment variables (PVM's idea, not ours); something like, -probably in your .cshrc or equivalent: - - -setenv PVM_ROOT /wherever/you/put/it -setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch` -setenv PVM_DPATH $PVM_ROOT/lib/pvmd - - - - - -Creating and/or controlling your “parallel machine” is a purely-PVM -business; nothing specific to parallel Haskell. The following paragraphs -describe how to configure your parallel machine interactively. - - - -If you use parallel Haskell regularly on the same machine configuration it -is a good idea to maintain a file with all machine names and to make the -environment variable PVM_HOST_FILE point to this file. Then you can avoid -the interactive operations described below by just saying - - - -pvm $PVM_HOST_FILE - - - -You use the pvmpvm command command to start PVM on your -machine. You can then do various things to control/monitor your -“parallel machine;” the most useful being: - - - - - - - - - -ControlD -exit pvm, leaving it running - - - -halt -kill off this “parallel machine” & exit - - - -add <host> -add <host> as a processor - - - -delete <host> -delete <host> - - - -reset -kill what's going, but leave PVM up - - - -conf -list the current configuration - - - -ps -report processes' status - - - -pstat <pid> -status of a particular process - - - - - - - - -The PVM documentation can tell you much, much more about pvm! - - - - - -parallelism profiles - - -parallelism profiles -profiles, parallelism -visualisation tools - - - -With parallel Haskell programs, we usually don't care about the -results—only with “how parallel” it was! We want pretty pictures. - - - -parallelism profiles (à la hbcpp) can be generated with the --qP RTS option RTS option. The -per-processor profiling info is dumped into files named -<full-path><program>.gr. These are then munged into a PostScript picture, -which you can then display. For example, to run your program -a.out on 8 processors, then view the parallelism profile, do: - - - - - -$ ./a.out +RTS -qP -qp8 -$ grs2gr *.???.gr > temp.gr # combine the 8 .gr files into one -$ gr2ps -O temp.gr # cvt to .ps; output in temp.ps -$ ghostview -seascape temp.ps # look at it! - + + Using SMP parallelism + parallelism + + SMP + - - - -The scripts for processing the parallelism profiles are distributed -in ghc/utils/parallel/. - - - - - -Other useful info about running parallel programs - - -The “garbage-collection statistics” RTS options can be useful for -seeing what parallel programs are doing. If you do either --Sstderr RTS option or , then -you'll get mutator, garbage-collection, etc., times on standard -error. The standard error of all PE's other than the `main thread' -appears in /tmp/pvml.nnn, courtesy of PVM. - - - -Whether doing or not, a handy way to watch -what's happening overall is: tail -f /tmp/pvml.nnn. - - - - - -RTS options for Parallel Haskell - - - -RTS options, parallel -parallel Haskell—RTS options - - - -Besides the usual runtime system (RTS) options -(), there are a few options particularly -for parallel execution. - - - - - - -: - - --qp<N> RTS option -(paraLLEL ONLY) Use <N> PVM processors to run this program; -the default is 2. - - - - -: - - --C<s> RTS option Sets -the context switch interval to <s> seconds. -A context switch will occur at the next heap block allocation after -the timer expires (a heap block allocation occurs every 4k of -allocation). With or , -context switches will occur as often as possible (at every heap block -allocation). By default, context switches occur every 20ms. Note that GHC's internal timer ticks every 20ms, and -the context switch timer is always a multiple of this timer, so 20ms -is the maximum granularity available for timed context switches. - - - - -: - - --q RTS option -(paraLLEL ONLY) Produce a quasi-parallel profile of thread activity, -in the file <program>.qp. In the style of hbcpp, this profile -records the movement of threads between the green (runnable) and red -(blocked) queues. If you specify the verbose suboption (), the -green queue is split into green (for the currently running thread -only) and amber (for other runnable threads). We do not recommend -that you use the verbose suboption if you are planning to use the -hbcpp profiling tools or if you are context switching at every heap -check (with ). ---> - - - - -: - - --qt<num> RTS option -(paraLLEL ONLY) Limit the thread pool size, i.e. the number of -threads per processor to <num>. The default is -32. Each thread requires slightly over 1K words in -the heap for thread state and stack objects. (For 32-bit machines, this -translates to 4K bytes, and for 64-bit machines, 8K bytes.) - - - - - -: - - --qe<num> RTS option -(parallel) (paraLLEL ONLY) Limit the spark pool size -i.e. the number of pending sparks per processor to -<num>. The default is 100. A larger number may be -appropriate if your program generates large amounts of parallelism -initially. - - - - -: - - --qQ<num> RTS option (parallel) -(paraLLEL ONLY) Set the size of packets transmitted between processors -to <num>. The default is 1024 words. A larger number may be -appropriate if your machine has a high communication cost relative to -computation speed. - - - - -: - - --qh<num> RTS option (parallel) -(paraLLEL ONLY) Select a packing scheme. Set the number of non-root thunks to pack in one packet to -<num>-1 (0 means infinity). By default GUM uses full-subgraph -packing, i.e. the entire subgraph with the requested closure as root is -transmitted (provided it fits into one packet). Choosing a smaller value -reduces the amount of pre-fetching of work done in GUM. This can be -advantageous for improving data locality but it can also worsen the balance -of the load in the system. - - - - -: - - --qg<num> RTS option -(parallel) (paraLLEL ONLY) Select a globalisation -scheme. This option affects the -generation of global addresses when transferring data. Global addresses are -globally unique identifiers required to maintain sharing in the distributed -graph structure. Currently this is a binary option. With <num>=0 full globalisation is used -(default). This means a global address is generated for every closure that -is transmitted. With <num>=1 a thunk-only globalisation scheme is -used, which generated global address only for thunks. The latter case may -lose sharing of data but has a reduced overhead in packing graph structures -and maintaining internal tables of global addresses. - - - - - - - + GHC supports running Haskell programs in parallel on an SMP + (symmetric multiprocessor). + + There's a fine distinction between + concurrency and parallelism: + parallelism is all about making your program run + faster by making use of multiple processors + simultaneously. Concurrency, on the other hand, is a means of + abstraction: it is a convenient way to structure a program that must + respond to multiple asynchronous events. + + However, the two terms are certainly related. By making use of + multiple CPUs it is possible to run concurrent threads in parallel, + and this is exactly what GHC's SMP parallelism support does. But it + is also possible to obtain performance improvements with parallelism + on programs that do not use concurrency. This section describes how to + use GHC to compile and run parallel programs, in we desribe the language features that affect + parallelism. + + + Options to enable SMP parallelism - + In order to make use of multiple CPUs, your program must be + linked with the option (see ). Then, to run a program on multiple + CPUs, use the RTS option: + + + + + + RTS option + Use x simultaneous threads when + running the program. Normally x + should be chosen to match the number of CPU cores on the machine. + There is no means (currently) by which this value may vary after + the program has started. + + For example, on a dual-core machine we would probably use + +RTS -N2 -RTS. + + Whether hyperthreading cores should be counted or not is an + open question; please feel free to experiment and let us know what + results you find. + + + + + + + Hints for using SMP parallelism + + Add the -sstderr RTS option when + running the program to see timing stats, which will help to tell you + whether your program got faster by using more CPUs or not. If the user + time is greater than + the elapsed time, then the program used more than one CPU. You should + also run the program without -N for comparison. + + GHC's parallelism support is new and experimental. It may make your + program go faster, or it might slow it down - either way, we'd be + interested to hear from you. + + One significant limitation with the current implementation is that + the garbage collector is still single-threaded, and all execution must + stop when GC takes place. This can be a significant bottleneck in a + parallel program, especially if your program does a lot of GC. If this + happens to you, then try reducing the cost of GC by tweaking the GC + settings (): enlarging the heap or the + allocation area size is a good start. + + Platform-specific Flags @@ -1884,18 +1610,6 @@ and maintaining internal tables of global addresses. - : - - (SPARC machines)-mv8 option (SPARC - only) Means to pass the like-named - option to GCC; it says to use the Version 8 SPARC - instructions, notably integer multiply and divide. The - similar GCC options for SPARC also - work, actually. - - - - : (iX86 machines)-monly-N-regs