docs/comm/the-beast/stg.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3   <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5     <title>The GHC Commentary - You Got Control: The STG-language</title>
   6   </head>
   7
   8   <body BGCOLOR="FFFFFF">
   9     <h1>The GHC Commentary - You Got Control: The STG-language</h1>
  10     <p>
  11       GHC contains two completely independent backends: the byte code
  12       generator and the machine code generator.  The decision over which of
  13       the two is invoked is made in <a
  14       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscCodeGen</code>.
  15       The machine code generator proceeds itself in a number of phases: First,
  16       the <a href="desugar.html">Core</a> intermediate language is translated
  17       into <em>STG-language</em>; second, STG-language is transformed into a
  18       GHC-internal variant of <a href="http://www.cminusminus.org/">C--</a>;
  19       and thirdly, this is either emitted as concrete C--, converted to GNU C,
  20       or translated to native code (by the <a href="ncg.html">native code
  21       generator</a> which targets IA32, Sparc, and PowerPC [as of March '5]).
  22     </p>
  23     <p>
  24       In the following, we will have a look at the first step of machine code
  25       generation, namely the translation steps involving the STG-language.
  26       Details about the underlying abstract machine, the <em>Spineless Tagless
  27       G-machine</em>, are in <a
  28       href="http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/spineless-tagless-gmachine.ps.gz&pub=34">Implementing
  29       lazy functional languages on stock hardware: the Spineless Tagless
  30       G-machine</a>, SL Peyton Jones, Journal of Functional Programming 2(2),
  31       Apr 1992, pp127-202. (Some details have changed since the publication of
  32       this article, but it still gives a good introduction to the main
  33       concepts.)
  34     </p>
  35
  36     <h2>The STG Language</h2>
  37     <p>
  38       The AST of the STG-language and the generation of STG code from Core is
  39       both located in the <a
  40       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/"><code>stgSyn/</code></a>
  41       directory; in the modules <a
  42       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
  43       and <a
  44       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a>,
  45       respectively.
  46     </p>
  47     <p>
  48       Conceptually, the STG-language is a lambda calculus (including data
  49       constructors and case expressions) whose syntax is restricted to make
  50       all control flow explicit.  As such, it can be regarded as a variant of
  51       <em>administrative normal form (ANF).</em> (C.f., <a
  52       href="http://doi.acm.org/10.1145/173262.155113">The essence of compiling
  53       with continuations.</a> Cormac Flanagan, Amr Sabry, Bruce F. Duba, and
  54       Matthias Felleisen. <em>ACM SIGPLAN Conference on Programming Language
  55         Design and Implementation,</em> ACM Press, 1993.)  Each syntactic from
  56       has a precise operational interpretation, in addition to the
  57       denotational interpretation inherited from the lambda calculus.  The
  58       concrete representation of the STG language inside GHC also includes
  59       auxiliary attributes, such as <em>static reference tables (SRTs),</em>
  60       which determine the top-level bindings referenced by each let binding
  61       and case expression.
  62     </p>
  63     <p>
  64       As usual in ANF, arguments to functions etc. are restricted to atoms
  65       (i.e., constants or variables), which implies that all sub-expressions
  66       are explicitly named and evaluation order is explicit.  Specific to the
  67       STG language is that all let bindings correspond to closure allocation
  68       (thunks, function closures, and data constructors) and that case
  69       expressions encode both computation and case selection.  There are two
  70       flavours of case expressions scrutinising boxed and unboxed values,
  71       respectively.  The former perform function calls including demanding the
  72       evaluation of thunks, whereas the latter execute primitive operations
  73       (such as arithmetic on fixed size integers and floating-point numbers).
  74     </p>
  75     <p>
  76       The representation of STG language defined in <a
  77       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/StgSyn.lhs"><code>StgSyn</code></a>
  78       abstracts over both binders and occurences of variables.  The type names
  79       involved in this generic definition all carry the prefix
  80       <code>Gen</code> (such as in <code>GenStgBinding</code>).  Instances of
  81       these generic definitions, where both binders and occurences are of type
  82       <a
  83       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/basicTypes/Id.lhs"><code>Id</code></a><code>.Id</code>
  84       are defined as type synonyms and use type names that drop the
  85       <code>Gen</code> prefix (i.e., becoming plain <code>StgBinding</code>).
  86       Complete programs in STG form are represented by values of type
  87       <code>[StgBinding]</code>.
  88     </p>
  89
  90     <h2>From Core to STG</h2>
  91     <p>
  92       Although, the actual translation from Core AST into STG AST is performed
  93       by the function <a
  94       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>
  95       (or <a
  96       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreExprToStg</code>
  97       for individual expressions), the translation crucial depends on <a
  98       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepPgm</code>
  99       (resp. <a
 100       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/coreSyn/CorePrep.lhs"><code>CorePrep</code></a><code>.corePrepExpr</code>),
 101       which prepares Core code for code generation (for both byte code and
 102       machine code generation).  <code>CorePrep</code> saturates primitive and
 103       constructor applications, turns the code into A-normal form, renames all
 104       identifiers into globally unique names, generates bindings for
 105       constructor workers, constructor wrappers, and record selectors plus
 106       some further cleanup.
 107     </p>
 108     <p>
 109       In other words, after Core code is prepared for code generation it is
 110       structurally already in the form required by the STG language.  The main
 111       work performed by the actual transformation from Core to STG, as
 112       performed by <a
 113       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.coreToStg</code>,
 114       is to compute the live and free variables as well as live CAFs (constant
 115       applicative forms) at each let binding and case alternative.  In
 116       subsequent phases, the live CAF information is used to compute SRTs.
 117       The live variable information is used to determine which stack slots
 118       need to be zapped (to avoid space leaks) and the free variable
 119       information is need to construct closures.  Moreover, hints for
 120       optimised code generation are computed, such as whether a closure needs
 121       to be updated after is has been evaluated.
 122     </p>
 123
 124     <h2>STG Passes</h2>
 125     <p>
 126       These days little actual work is performed on programs in STG form; in
 127       particular, the code is not further optimised.  All serious optimisation
 128       (except low-level optimisations which are performed during native code
 129       generation) has already been done on Core.  The main task of <a
 130       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/stgSyn/CoreToStg.lhs"><code>CoreToStg</code></a><code>.stg2stg</code>
 131       is to compute SRTs from the live CAF information determined during STG
 132       generation.  Other than that, <a
 133       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/profiling/SCCfinal.lhs"><code>SCCfinal</code></a><code>.stgMassageForProfiling</code>
 134       is executed when compiling for profiling and information may be dumped
 135       for debugging purposes.
 136     </p>
 137
 138     <h2>Towards C--</h2>
 139     <p>
 140       GHC's internal form of C-- is defined in the module <a
 141       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/cmm/Cmm.hs"><code>Cmm</code></a>.
 142       The definition is generic in that it abstracts over the type of static
 143       data and of the contents of basic blocks (i.e., over the concrete
 144       representation of constant data and instructions).  These generic
 145       definitions have names carrying the prefix <code>Gen</code> (such as
 146       <code>GenCmm</code>).  The same module also instantiates the generic
 147       form to a concrete form where data is represented by
 148       <code>CmmStatic</code> and instructions are represented by
 149       <code>CmmStmt</code> (giving us, e.g., <code>Cmm</code> from
 150       <code>GenCmm</code>).  The concrete form more or less follows the
 151       external <a href="http://www.cminusminus.org/">C--</a> language.
 152     </p>
 153     <p>
 154       Programs in STG form are translated to <code>Cmm</code> by <a
 155       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/codeGen/CodeGen.lhs"><code>CodeGen</code></a><code>.codeGen</code>
 156     </p>
 157
 158     <p><hr><small>
 159 <!-- hhmts start -->
 160 Last modified: Sat Mar  5 22:55:25 EST 2005
 161 <!-- hhmts end -->
 162     </small>
 163   </body>
 164 </html>