From: chak Date: Sat, 5 Mar 2005 11:58:41 +0000 (+0000) Subject: [project @ 2005-03-05 11:58:41 by chak] X-Git-Tag: Initial_conversion_from_CVS_complete~972 X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=commitdiff_plain;h=09d72f1d5410c8c25faed86320ef5df923797389 [project @ 2005-03-05 11:58:41 by chak] Extended the commentary with a section about the STG-related parts of GHC (generation of STG from Core, STG passes, and generation of Cmm). [BTW, it's a pity that nobody bothered to write up the new code generation structure when it was implemented not long ago.] --- diff --git a/ghc/docs/comm/index.html b/ghc/docs/comm/index.html index 09357b2..1cac3cf 100644 --- a/ghc/docs/comm/index.html +++ b/ghc/docs/comm/index.html @@ -6,7 +6,7 @@ -

The Glasgow Haskell Compiler (GHC) Commentary [v0.14]

+

The Glasgow Haskell Compiler (GHC) Commentary [v0.15]

-Last modified: Sat Sep 13 01:15:05 BST 2003 +Last modified: Sat Mar 5 19:52:33 EST 2005 diff --git a/ghc/docs/comm/the-beast/stg.html b/ghc/docs/comm/the-beast/stg.html new file mode 100644 index 0000000..4581da7 --- /dev/null +++ b/ghc/docs/comm/the-beast/stg.html @@ -0,0 +1,164 @@ + + + + + The GHC Commentary - You Got Control: The STG-language + + + +

The GHC Commentary - You Got Control: The STG-language

+

+ GHC contains two completely independent backends: the byte code + generator and the machine code generator. The decision over which of + the two is invoked is made in HscMain.hscCodeGen. + The machine code generator proceeds itself in a number of phases: First, + the Core intermediate language is translated + into STG-language; second, STG-language is transformed into a + GHC-internal variant of C--; + and thirdly, this is either emitted as concrete C--, converted to GNU C, + or translated to native code (by the native code + generator which targets IA32, Sparc, and PowerPC [as of March '5]). +

+

+ In the following, we will have a look at the first step of machine code + generation, namely the translation steps involving the STG-language. + Details about the underlying abstract machine, the Spineless Tagless + G-machine, are in Implementing + lazy functional languages on stock hardware: the Spineless Tagless + G-machine, SL Peyton Jones, Journal of Functional Programming 2(2), + Apr 1992, pp127-202. (Some details have changed since the publication of + this article, but it still gives a good introduction to the main + concepts.) +

+ +

The STG Language

+

+ The AST of the STG-language and the generation of STG code from Core is + both located in the stgSyn/ + directory; in the modules StgSyn + and CoreToStg, + respectively. +

+

+ Conceptually, the STG-language is a lambda calculus (including data + constructors and case expressions) whose syntax is restricted to make + all control flow explicit. As such, it can be regarded as a variant of + administrative normal form (ANF). (C.f., The essence of compiling + with continuations. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and + Matthias Felleisen. ACM SIGPLAN Conference on Programming Language + Design and Implementation, ACM Press, 1993.) Each syntactic from + has a precise operational interpretation, in addition to the + denotational interpretation inherited from the lambda calculus. The + concrete representation of the STG language inside GHC also includes + auxiliary attributes, such as static reference tables (SRTs), + which determine the top-level bindings referenced by each let binding + and case expression. +

+

+ As usual in ANF, arguments to functions etc. are restricted to atoms + (i.e., constants or variables), which implies that all sub-expressions + are explicitly named and evaluation order is explicit. Specific to the + STG language is that all let bindings correspond to closure allocation + (thunks, function closures, and data constructors) and that case + expressions encode both computation and case selection. There are two + flavours of case expressions scrutinising boxed and unboxed values, + respectively. The former perform function calls including demanding the + evaluation of thunks, whereas the latter execute primitive operations + (such as arithmetic on fixed size integers and floating-point numbers). +

+

+ The representation of STG language defined in StgSyn + abstracts over both binders and occurences of variables. The type names + involved in this generic definition all carry the prefix + Gen (such as in GenStgBinding). Instances of + these generic definitions, where both binders and occurences are of type + Id.Id + are defined as type synonyms and use type names that drop the + Gen prefix (i.e., becoming plain StgBinding). + Complete programs in STG form are represented by values of type + [StgBinding]. +

+ +

From Core to STG

+

+ Although, the actual translation from Core AST into STG AST is performed + by the function CoreToStg.coreToStg + (or CoreToStg.coreExprToStg + for individual expressions), the translation crucial depends on CorePrep.corePrepPgm + (resp. CorePrep.corePrepExpr), + which prepares Core code for code generation (for both byte code and + machine code generation). CorePrep saturates primitive and + constructor applications, turns the code into A-normal form, renames all + identifiers into globally unique names, generates bindings for + constructor workers, constructor wrappers, and record selectors plus + some further cleanup. +

+

+ In other words, after Core code is prepared for code generation it is + structurally already in the form required by the STG language. The main + work performed by the actual transformation from Core to STG, as + performed by CoreToStg.coreToStg, + is to compute the live and free variables as well as live CAFs (constant + applicative forms) at each let binding and case alternative. In + subsequent phases, the live CAF information is used to compute SRTs. + The live variable information is used to determine which stack slots + need to be zapped (to avoid space leaks) and the free variable + information is need to construct closures. Moreover, hints for + optimised code generation are computed, such as whether a closure needs + to be updated after is has been evaluated. +

+ +

STG Passes

+

+ These days little actual work is performed on programs in STG form; in + particular, the code is not further optimised. All serious optimisation + (except low-level optimisations which are performed during native code + generation) has already been done on Core. The main task of CoreToStg.stg2stg + is to compute SRTs from the live CAF information determined during STG + generation. Other than that, SCCfinal.stgMassageForProfiling + is executed when compiling for profiling and information may be dumped + for debugging purposes. +

+ +

Towards C--

+

+ GHC's internal form of C-- is defined in the module Cmm. + The definition is generic in that it abstracts over the type of static + data and of the contents of basic blocks (i.e., over the concrete + representation of constant data and instructions). These generic + definitions have names carrying the prefix Gen (such as + GenCmm). The same module also instantiates the generic + form to a concrete form where data is represented by + CmmStatic and instructions are represented by + CmmStmt (giving us, e.g., Cmm from + GenCmm). The concrete form more or less follows the + external C-- language. +

+

+ Programs in STG form are translated to Cmm by CodeGen.codeGen +

+ +


+ +Last modified: Sat Mar 5 22:55:25 EST 2005 + + + +