Debugging the compiler

Debugging the compiler debugging options (for GHC) HACKER TERRITORY. HACKER TERRITORY. (You were warned.) Replacing the program for one or more phases. GHC phases, changing phases, changing GHC You may specify that a different program be used for one of the phases of the compilation system, in place of whatever the driver ghc has wired into it. For example, you might want to try a different assembler. The -pgm<phase><stuff> option option to ghc will cause it to use <program-name> for phase <phase-code>, where the codes to indicate the phases are: code phase L literate pre-processor P C pre-processor (if -cpp only) C Haskell compiler c C compiler a assembler l linker dep Makefile dependency generator Forcing options to a particular phase. forcing GHC-phase options The preceding sections describe driver options that are mostly applicable to one particular phase. You may also force a specific option to be passed to a particular phase <phase-code> by feeding the driver the option .-opt<phase><stuff> option The codes to indicate the phases are the same as in the previous section. So, for example, to force an option to the assembler, you would tell the driver (the dash before the E is required). Besides getting options to the Haskell compiler with , you can get options through to its runtime system with -optCrts<blah> option. So, for example: when I want to use my normal driver but with my profiled compiler binary, I use this script: #! /bin/sh exec /local/grasp_tmp3/simonpj/ghc-BUILDS/working-alpha/ghc/driver/ghc \ -pgmC/local/grasp_tmp3/simonpj/ghc-BUILDS/working-hsc-prof/hsc \ -optCrts-i0.5 \ -optCrts-PT \ "$@" Dumping out compiler intermediate structures dumping GHC intermediates intermediate passes, output : -noC option Don't bother generating C output or an interface file. Usually used in conjunction with one or more of the options; for example: ghc -noC -ddump-simpl Foo.hs : -hi option Do generate an interface file. This would normally be used in conjunction with , which turns off interface generation; thus: . : -dshow-passes option Prints a message to stderr as each pass starts. Gives a warm but undoubtedly misleading feeling that GHC is telling you what's happening. : -ddump-<pass> options Make a debugging dump after pass <pass> (may be common enough to need a short form…). You can get all of these at once (lots of output) by using , or most of them with . Some of the most useful ones are: : parser output : renamer output : Dump to the file "M.imports" (where M is the module being compiled) a "minimal" set of import declarations. You can safely replace all the import declarations in "M.hs" with those found in "M.imports". Why would you want to do that? Because the "minimal" imports (a) import everything explicitly, by name, and (b) import nothing that is not required. It can be quite painful to maintain this property by hand, so this flag is intended to reduce the labour. : typechecker output : Dump a type signature for each value defined at the top level of the module. The list is sorted alphabetically. Using dumps a type signature for all the imported and system-defined things as well; useful for debugging the compiler. : derived instances : desugarer output : output of specialisation pass : dumps all rewrite rules (including those generated by the specialisation pass) : simplifer output (Core-to-Core passes) : UsageSP inference pre-inf and output : CPR analyser output : strictness analyser output : worker/wrapper split output : `occurrence analysis' output : output of STG-to-STG passes : unflattened Abstract C : flattened Abstract C : same as what goes to the C compiler : assembly language from the native-code generator -ddump-all option -ddump-most option -ddump-parsed option -ddump-rn option -ddump-tc option -ddump-deriv option -ddump-ds option -ddump-simpl option -ddump-cpranal option -ddump-workwrap option -ddump-rules option -ddump-usagesp option -ddump-stranal option -ddump-occur-anal option -ddump-spec option -ddump-stg option -ddump-absC option -ddump-flatC option -ddump-realC option -ddump-asm option and : -dverbose-simpl option -dverbose-stg option Show the output of the intermediate Core-to-Core and STG-to-STG passes, respectively. (Lots of output!) So: when we're really desperate: % ghc -noC -O -ddump-simpl -dverbose-simpl -dcore-lint Foo.hs : -ddump-simpl-iterations option Show the output of each iteration of the simplifier (each run of the simplifier has a maximum number of iterations, normally 4). Used when even doesn't cut it. }: -dppr-user option -dppr-debug option Debugging output is in one of several “styles.” Take the printing of types, for example. In the “user” style, the compiler's internal ideas about types are presented in Haskell source-level syntax, insofar as possible. In the “debug” style (which is the default for debugging output), the types are printed in with explicit foralls, and variables have their unique-id attached (so you can check for things that look the same but aren't). : -ddump-simpl-stats option Dump statistics about how many of each kind of transformation too place. If you add you get more detailed information. : -ddump-raw-asm option Dump out the assembly-language stuff, before the “mangler” gets it. : -ddump-rn-trace Make the renamer be *real* chatty about what it is upto. : -dshow-rn-stats Print out summary of what kind of information the renamer had to bring in. : -dshow-unused-imports Have the renamer report what imports does not contribute. Checking for consistency consistency checks lint : -dcore-lint option Turn on heavyweight intra-pass sanity-checking within GHC, at Core level. (It checks GHC's sanity, not yours.) : -dstg-lint option Ditto for STG level. : -dstg-lint option Turn on checks around UsageSP inference (). This verifies various simple properties of the results of the inference, and also warns if any identifier with a used-once annotation before the inference has a used-many annotation afterwards; this could indicate a non-worksafe transformation is being applied. How to read Core syntax (from some <Option>-ddump-*</Option> flags) reading Core syntax Core syntax, how to read Let's do this by commenting an example. It's from doing on this code: skip2 m = m : skip2 (m+2) Before we jump in, a word about names of things. Within GHC, variables, type constructors, etc., are identified by their “Uniques.” These are of the form `letter' plus `number' (both loosely interpreted). The `letter' gives some idea of where the Unique came from; e.g., _ means “built-in type variable”; t means “from the typechecker”; s means “from the simplifier”; and so on. The `number' is printed fairly compactly in a `base-62' format, which everyone hates except me (WDP). Remember, everything has a “Unique” and it is usually printed out when debugging, in some form or another. So here we go… Desugared: Main.skip2{-r1L6-} :: _forall_ a$_4 =>{{Num a$_4}} -> a$_4 -> [a$_4] --# `r1L6' is the Unique for Main.skip2; --# `_4' is the Unique for the type-variable (template) `a' --# `{{Num a$_4}}' is a dictionary argument _NI_ --# `_NI_' means "no (pragmatic) information" yet; it will later --# evolve into the GHC_PRAGMA info that goes into interface files. Main.skip2{-r1L6-} = /\ _4 -> \ d.Num.t4Gt -> let { {- CoRec -} +.t4Hg :: _4 -> _4 -> _4 _NI_ +.t4Hg = (+{-r3JH-} _4) d.Num.t4Gt fromInt.t4GS :: Int{-2i-} -> _4 _NI_ fromInt.t4GS = (fromInt{-r3JX-} _4) d.Num.t4Gt --# The `+' class method (Unique: r3JH) selects the addition code --# from a `Num' dictionary (now an explicit lamba'd argument). --# Because Core is 2nd-order lambda-calculus, type applications --# and lambdas (/\) are explicit. So `+' is first applied to a --# type (`_4'), then to a dictionary, yielding the actual addition --# function that we will use subsequently... --# We play the exact same game with the (non-standard) class method --# `fromInt'. Unsurprisingly, the type `Int' is wired into the --# compiler. lit.t4Hb :: _4 _NI_ lit.t4Hb = let { ds.d4Qz :: Int{-2i-} _NI_ ds.d4Qz = I#! 2# } in fromInt.t4GS ds.d4Qz --# `I# 2#' is just the literal Int `2'; it reflects the fact that --# GHC defines `data Int = I# Int#', where Int# is the primitive --# unboxed type. (see relevant info about unboxed types elsewhere...) --# The `!' after `I#' indicates that this is a *saturated* --# application of the `I#' data constructor (i.e., not partially --# applied). skip2.t3Ja :: _4 -> [_4] _NI_ skip2.t3Ja = \ m.r1H4 -> let { ds.d4QQ :: [_4] _NI_ ds.d4QQ = let { ds.d4QY :: _4 _NI_ ds.d4QY = +.t4Hg m.r1H4 lit.t4Hb } in skip2.t3Ja ds.d4QY } in :! _4 m.r1H4 ds.d4QQ {- end CoRec -} } in skip2.t3Ja (“It's just a simple functional language” is an unregisterised trademark of Peyton Jones Enterprises, plc.) Command line options in source files source-file options Sometimes it is useful to make the connection between a source file and the command-line options it requires quite tight. For instance, if a (Glasgow) Haskell source file uses casms, the C back-end often needs to be told about which header files to include. Rather than maintaining the list of files the source depends on in a Makefile (using the command-line option), it is possible to do this directly in the source file using the OPTIONS pragma OPTIONS pragma: {-# OPTIONS -#include "foo.h" #-} module X where ... OPTIONS pragmas are only looked for at the top of your source files, upto the first (non-literate,non-empty) line not containing OPTIONS. Multiple OPTIONS pragmas are recognised. Note that your command shell does not get to the source file options, they are just included literally in the array of command-line arguments the compiler driver maintains internally, so you'll be desperately disappointed if you try to glob etc. inside OPTIONS. NOTE: the contents of OPTIONS are prepended to the command-line options, so you *do* have the ability to override OPTIONS settings via the command line. It is not recommended to move all the contents of your Makefiles into your source files, but in some circumstances, the OPTIONS pragma is the Right Thing. (If you use and have OPTION flags in your module, the OPTIONS will get put into the generated .hc file). Unregisterised compilation unregisterised compilation The term "unregisterised" really means "compile via vanilla C", disabling some of the platform-specific tricks that GHC normally uses to make programs go faster. When compiling unregisterised, GHC simply generates a C file which is compiled via gcc. Unregisterised compilation can be useful when porting GHC to a new machine, since it reduces the prerequisite tools to gcc, as, and ld and nothing more, and furthermore the amount of platform-specific code that needs to be written in order to get unregisterised compilation going is usually fairly small. : Compile via vanilla ANSI C only, turning off platform-specific optimisations. NOTE: in order to use , you need to have a set of libraries (including the RTS) built for unregisterised compilation. This amounts to building GHC with way "u" enabled.