X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fthe-beast%2Fstg.html;fp=docs%2Fcomm%2Fthe-beast%2Fstg.html;h=4581da7d1f88720dd1ffb24dd31d3f66a83f2363;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/the-beast/stg.html b/docs/comm/the-beast/stg.html new file mode 100644 index 0000000..4581da7 --- /dev/null +++ b/docs/comm/the-beast/stg.html @@ -0,0 +1,164 @@ + + + + + The GHC Commentary - You Got Control: The STG-language + + + +

The GHC Commentary - You Got Control: The STG-language

+

+ GHC contains two completely independent backends: the byte code + generator and the machine code generator. The decision over which of + the two is invoked is made in HscMain.hscCodeGen. + The machine code generator proceeds itself in a number of phases: First, + the Core intermediate language is translated + into STG-language; second, STG-language is transformed into a + GHC-internal variant of C--; + and thirdly, this is either emitted as concrete C--, converted to GNU C, + or translated to native code (by the native code + generator which targets IA32, Sparc, and PowerPC [as of March '5]). +

+

+ In the following, we will have a look at the first step of machine code + generation, namely the translation steps involving the STG-language. + Details about the underlying abstract machine, the Spineless Tagless + G-machine, are in Implementing + lazy functional languages on stock hardware: the Spineless Tagless + G-machine, SL Peyton Jones, Journal of Functional Programming 2(2), + Apr 1992, pp127-202. (Some details have changed since the publication of + this article, but it still gives a good introduction to the main + concepts.) +

+ +

The STG Language

+

+ The AST of the STG-language and the generation of STG code from Core is + both located in the stgSyn/ + directory; in the modules StgSyn + and CoreToStg, + respectively. +

+

+ Conceptually, the STG-language is a lambda calculus (including data + constructors and case expressions) whose syntax is restricted to make + all control flow explicit. As such, it can be regarded as a variant of + administrative normal form (ANF). (C.f., The essence of compiling + with continuations. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and + Matthias Felleisen. ACM SIGPLAN Conference on Programming Language + Design and Implementation, ACM Press, 1993.) Each syntactic from + has a precise operational interpretation, in addition to the + denotational interpretation inherited from the lambda calculus. The + concrete representation of the STG language inside GHC also includes + auxiliary attributes, such as static reference tables (SRTs), + which determine the top-level bindings referenced by each let binding + and case expression. +

+

+ As usual in ANF, arguments to functions etc. are restricted to atoms + (i.e., constants or variables), which implies that all sub-expressions + are explicitly named and evaluation order is explicit. Specific to the + STG language is that all let bindings correspond to closure allocation + (thunks, function closures, and data constructors) and that case + expressions encode both computation and case selection. There are two + flavours of case expressions scrutinising boxed and unboxed values, + respectively. The former perform function calls including demanding the + evaluation of thunks, whereas the latter execute primitive operations + (such as arithmetic on fixed size integers and floating-point numbers). +

+

+ The representation of STG language defined in StgSyn + abstracts over both binders and occurences of variables. The type names + involved in this generic definition all carry the prefix + Gen (such as in GenStgBinding). Instances of + these generic definitions, where both binders and occurences are of type + Id.Id + are defined as type synonyms and use type names that drop the + Gen prefix (i.e., becoming plain StgBinding). + Complete programs in STG form are represented by values of type + [StgBinding]. +

+ +

From Core to STG

+

+ Although, the actual translation from Core AST into STG AST is performed + by the function CoreToStg.coreToStg + (or CoreToStg.coreExprToStg + for individual expressions), the translation crucial depends on CorePrep.corePrepPgm + (resp. CorePrep.corePrepExpr), + which prepares Core code for code generation (for both byte code and + machine code generation). CorePrep saturates primitive and + constructor applications, turns the code into A-normal form, renames all + identifiers into globally unique names, generates bindings for + constructor workers, constructor wrappers, and record selectors plus + some further cleanup. +

+

+ In other words, after Core code is prepared for code generation it is + structurally already in the form required by the STG language. The main + work performed by the actual transformation from Core to STG, as + performed by CoreToStg.coreToStg, + is to compute the live and free variables as well as live CAFs (constant + applicative forms) at each let binding and case alternative. In + subsequent phases, the live CAF information is used to compute SRTs. + The live variable information is used to determine which stack slots + need to be zapped (to avoid space leaks) and the free variable + information is need to construct closures. Moreover, hints for + optimised code generation are computed, such as whether a closure needs + to be updated after is has been evaluated. +

+ +

STG Passes

+

+ These days little actual work is performed on programs in STG form; in + particular, the code is not further optimised. All serious optimisation + (except low-level optimisations which are performed during native code + generation) has already been done on Core. The main task of CoreToStg.stg2stg + is to compute SRTs from the live CAF information determined during STG + generation. Other than that, SCCfinal.stgMassageForProfiling + is executed when compiling for profiling and information may be dumped + for debugging purposes. +

+ +

Towards C--

+

+ GHC's internal form of C-- is defined in the module Cmm. + The definition is generic in that it abstracts over the type of static + data and of the contents of basic blocks (i.e., over the concrete + representation of constant data and instructions). These generic + definitions have names carrying the prefix Gen (such as + GenCmm). The same module also instantiates the generic + form to a concrete form where data is represented by + CmmStatic and instructions are represented by + CmmStmt (giving us, e.g., Cmm from + GenCmm). The concrete form more or less follows the + external C-- language. +

+

+ Programs in STG form are translated to Cmm by CodeGen.codeGen +

+ +


+ +Last modified: Sat Mar 5 22:55:25 EST 2005 + + + +