X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fthe-beast%2Fstg.html;fp=docs%2Fcomm%2Fthe-beast%2Fstg.html;h=4581da7d1f88720dd1ffb24dd31d3f66a83f2363;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/the-beast/stg.html b/docs/comm/the-beast/stg.html new file mode 100644 index 0000000..4581da7 --- /dev/null +++ b/docs/comm/the-beast/stg.html @@ -0,0 +1,164 @@ + + +
+ +
+ GHC contains two completely independent backends: the byte code
+ generator and the machine code generator. The decision over which of
+ the two is invoked is made in HscMain
.hscCodeGen
.
+ The machine code generator proceeds itself in a number of phases: First,
+ the Core intermediate language is translated
+ into STG-language; second, STG-language is transformed into a
+ GHC-internal variant of C--;
+ and thirdly, this is either emitted as concrete C--, converted to GNU C,
+ or translated to native code (by the native code
+ generator which targets IA32, Sparc, and PowerPC [as of March '5]).
+
+ In the following, we will have a look at the first step of machine code + generation, namely the translation steps involving the STG-language. + Details about the underlying abstract machine, the Spineless Tagless + G-machine, are in Implementing + lazy functional languages on stock hardware: the Spineless Tagless + G-machine, SL Peyton Jones, Journal of Functional Programming 2(2), + Apr 1992, pp127-202. (Some details have changed since the publication of + this article, but it still gives a good introduction to the main + concepts.) +
+ +
+ The AST of the STG-language and the generation of STG code from Core is
+ both located in the stgSyn/
+ directory; in the modules StgSyn
+ and CoreToStg
,
+ respectively.
+
+ Conceptually, the STG-language is a lambda calculus (including data + constructors and case expressions) whose syntax is restricted to make + all control flow explicit. As such, it can be regarded as a variant of + administrative normal form (ANF). (C.f., The essence of compiling + with continuations. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and + Matthias Felleisen. ACM SIGPLAN Conference on Programming Language + Design and Implementation, ACM Press, 1993.) Each syntactic from + has a precise operational interpretation, in addition to the + denotational interpretation inherited from the lambda calculus. The + concrete representation of the STG language inside GHC also includes + auxiliary attributes, such as static reference tables (SRTs), + which determine the top-level bindings referenced by each let binding + and case expression. +
++ As usual in ANF, arguments to functions etc. are restricted to atoms + (i.e., constants or variables), which implies that all sub-expressions + are explicitly named and evaluation order is explicit. Specific to the + STG language is that all let bindings correspond to closure allocation + (thunks, function closures, and data constructors) and that case + expressions encode both computation and case selection. There are two + flavours of case expressions scrutinising boxed and unboxed values, + respectively. The former perform function calls including demanding the + evaluation of thunks, whereas the latter execute primitive operations + (such as arithmetic on fixed size integers and floating-point numbers). +
+
+ The representation of STG language defined in StgSyn
+ abstracts over both binders and occurences of variables. The type names
+ involved in this generic definition all carry the prefix
+ Gen
(such as in GenStgBinding
). Instances of
+ these generic definitions, where both binders and occurences are of type
+ Id
.Id
+ are defined as type synonyms and use type names that drop the
+ Gen
prefix (i.e., becoming plain StgBinding
).
+ Complete programs in STG form are represented by values of type
+ [StgBinding]
.
+
+ Although, the actual translation from Core AST into STG AST is performed
+ by the function CoreToStg
.coreToStg
+ (or CoreToStg
.coreExprToStg
+ for individual expressions), the translation crucial depends on CorePrep
.corePrepPgm
+ (resp. CorePrep
.corePrepExpr
),
+ which prepares Core code for code generation (for both byte code and
+ machine code generation). CorePrep
saturates primitive and
+ constructor applications, turns the code into A-normal form, renames all
+ identifiers into globally unique names, generates bindings for
+ constructor workers, constructor wrappers, and record selectors plus
+ some further cleanup.
+
+ In other words, after Core code is prepared for code generation it is
+ structurally already in the form required by the STG language. The main
+ work performed by the actual transformation from Core to STG, as
+ performed by CoreToStg
.coreToStg
,
+ is to compute the live and free variables as well as live CAFs (constant
+ applicative forms) at each let binding and case alternative. In
+ subsequent phases, the live CAF information is used to compute SRTs.
+ The live variable information is used to determine which stack slots
+ need to be zapped (to avoid space leaks) and the free variable
+ information is need to construct closures. Moreover, hints for
+ optimised code generation are computed, such as whether a closure needs
+ to be updated after is has been evaluated.
+
+ These days little actual work is performed on programs in STG form; in
+ particular, the code is not further optimised. All serious optimisation
+ (except low-level optimisations which are performed during native code
+ generation) has already been done on Core. The main task of CoreToStg
.stg2stg
+ is to compute SRTs from the live CAF information determined during STG
+ generation. Other than that, SCCfinal
.stgMassageForProfiling
+ is executed when compiling for profiling and information may be dumped
+ for debugging purposes.
+
+ GHC's internal form of C-- is defined in the module Cmm
.
+ The definition is generic in that it abstracts over the type of static
+ data and of the contents of basic blocks (i.e., over the concrete
+ representation of constant data and instructions). These generic
+ definitions have names carrying the prefix Gen
(such as
+ GenCmm
). The same module also instantiates the generic
+ form to a concrete form where data is represented by
+ CmmStatic
and instructions are represented by
+ CmmStmt
(giving us, e.g., Cmm
from
+ GenCmm
). The concrete form more or less follows the
+ external C-- language.
+
+ Programs in STG form are translated to Cmm
by CodeGen
.codeGen
+