From 2bb3a439c106935d97fae7f7a0b60c21493d1bef Mon Sep 17 00:00:00 2001
From: "simonpj@microsoft.com" <unknown>
Date: Fri, 11 Sep 2009 13:35:13 +0000
Subject: [PATCH] Comments and Cmm notes

---
 compiler/cmm/CmmContFlowOpt.hs |    1 +
 compiler/cmm/cmm-notes         |  204 ++++++++++++++++++++++++++++++++--------
 compiler/main/HscMain.lhs      |    4 +-
 3 files changed, 168 insertions(+), 41 deletions(-)
diff --git a/compiler/cmm/CmmContFlowOpt.hs b/compiler/cmm/CmmContFlowOpt.hs
index e0d9555..64a2315 100644
--- a/compiler/cmm/CmmContFlowOpt.hs
+++ b/compiler/cmm/CmmContFlowOpt.hs
@@ -36,6 +36,7 @@ cmmCfgOptsZ g =
         -- with a more exciting combination of optimisations
 
 runCmmOpts :: Tx g -> Tx (GenCmm d h g)
+-- Lifts a transformer on a single graph to one on the whole program
 runCmmOpts opt = mapProcs (optProc opt)
 
 optProc :: Tx g -> Tx (GenCmmTop d h g)
diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes
index ee51624..5ec4895 100644
--- a/compiler/cmm/cmm-notes
+++ b/compiler/cmm/cmm-notes
@@ -2,6 +2,8 @@ Notes on new codegen (Sept 09)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Things to do:
+ - Consider module names
+
  - Top-level SRT threading is a bit ugly
 
  - Add type/newtype for CmmModule = [CmmGroup]    -- A module
@@ -12,6 +14,10 @@ Things to do:
    regardless of SplitObjs.   Question: can we *always* generate M.o if there
    is just one element in the list (rather than M/M1.o, M/M2.o etc)
 
+   One SRT per group.
+
+ - See "CAFs" below; we want to totally refactor the way SRTs are calculated
+
  - Change  
       type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)
    to
@@ -32,6 +38,11 @@ Things to do:
    type SubArea    = (Area, ByteOff, ByteWidth) 
    ByteOff should not be defined in SMRep -- that is too high up the hierarchy
    
+ - SMRep should not be imported by any module in cmm/!  Make it so.
+	-- ByteOff etc   ==>  CmmExpr
+        -- rET_SMALL etc ==> CmmInfo
+   Check that there are no other imports from codeGen in cmm/
+
  - Think about a non-flattened representation?
 
  - LastCall: 
@@ -66,48 +77,108 @@ Things to do:
         points that are not successors of a call, we think) can be treated
         uniformly: zero-size Area, and use inSP.
 
-Dead files
-~~~~~~~~~~
-CmmProcPoint (Michael Adams)
-CmmCPS (ditto)
+
+ - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small
+   C-- optimiser.  It has quite a lot of boilerplate folding code in AsmCodeGen
+   (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to
+   CmmOpt.  ToDo: see what optimisations are being done; and do them before
+   AsmCodeGen.
+
+ - Modularise the CPS pipeline; instead of ...; A;B;C; ...
+                                use  ..; ABC; ....
+
+ - Most of HscMain.tryNewCodeGen does not belong in HscMain.  Instead
+	if new_cg then
+             StgCmm.codeGen
+             processCmm  [including generating "raw" cmm]
+        else
+             CodeGen.codeGen
+             cmmToRawCmm
+
+
+ - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump)
+   then all CAF and stack liveness stuff be completed before we split
+   into separate C procedures.
+
+   Short term:
+     compute and attach liveness into to LastCall
+     right at end, split, cvt to old rep
+     [must split before cvt, because old rep is not expressive enough]
+
+   Longer term: 
+     when old rep disappears, 
+     move the whole splitting game into the C back end *only*
+	 (guided by the procpoint set)
+
+      
+----------------------------------------------------
+	Modules in cmm/
+----------------------------------------------------
+
+-------- Dead stuff ------------
+CmmProcPoint        Dead: Michael Adams
+CmmCPS              Dead: Michael Adams
+CmmCPSGen.hs        Dead: Michael Adams
+CmmBrokenBlock.hs   Dead: Michael Adams
+CmmLive.hs          Dead: Michael Adams
+CmmProcPoint.hs     Dead: Michael Adams
+Dataflow.hs         Dead: Michael Adams
+StackColor.hs       Norman?
+StackPlacements.hs  Norman?
+
 HscMain.optionallyConvertAndOrCPS
         testCmmConversion
 DynFlags:  -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz
 
-Proc-points
-~~~~~~~~~~~~
-Consider this program, which has a diamond control flow, 
-with a call on one branch
- fn(p,x) {
-        h()
-	if b then { ... f(x) ...; q=5; goto J }
-             else { ...; q=7; goto J }
-     J: ..p...q...
-  }
-then the join point J is a "proc-point".  So, is 'p' passed to J
-as a parameter?  Or, if 'p' was saved on the stack anyway, perhaps
-to keep it alive across the call to h(), maybe 'p' gets communicated
-to J that way. This is an awkward choice.  (We think that we currently
-never pass variables to join points via arguments.)
+-------- Moribund stuff ------------
+CmmCvt.hs      Conversion between old and new Cmm reps
+CmmOpt.hs      Hopefully-redundant optimiser
+CmmZipUtil.hs  Only one function; move elsewhere
 
-Furthermore, there is *no way* to pass q to J in a register (other
-than a paramter register).
+-------- Stuff to keep ------------
+CmmCPSZ.hs	          Driver for new pipeline
 
-What we want is to do register allocation across the whole caboodle.
-Then we could drop all the code that deals with the above awkward
-decisions about spilling variables across proc-points.
+CmmLiveZ.hs	          Liveness analysis, dead code elim
+CmmProcPointZ.hs          Identifying and splitting out proc-points
 
-Note that J doesn't need an info table.
+CmmSpillReload.hs         Save and restore across calls
 
-What we really want is for each Block to have an optional info table.
-To do that, we need to be polymorphic over first nodes.
+CmmCommonBlockElimZ.hs    Common block elim
+CmmContFlowOpt.hs         Other optimisations (branch-chain, merging)
+
+CmmBuildInfoTables.hs     New info-table 
+CmmStackLayout.hs         and stack layout 
+CmmCallConv.hs
+CmmInfo.hs                Defn of InfoTables, and conversion to exact layout
+
+---------- Cmm data types --------------
+ZipCfgCmmRep.hs	    Cmm instantiations of dataflow graph framework
+MkZipCfgCmm.hs      Cmm instantiations of dataflow graph framework
+
+Cmm.hs	      Key module; a mix of old and new stuff
+                  so needs tidying up in due course
+CmmExpr.hs
+CmmUtils.hs
+CmmLint.hs
+
+PprC.hs	            Pretty print Cmm in C syntax
+PprCmm.hs	    Pretty printer for Cmm
+PprCmmZ.hs	    Additional stuff for zipper rep
+
+CLabel.hs     CLabel
+
+----------  Dataflow modules --------------
+   Goal: separate library; for now, separate directory
+
+MkZipCfg.hs
+ZipCfg.hs
+ZipCfgExtras.hs
+ZipDataflow.hs
+CmmTx.hs	      Transactions
+OptimizationFuel.hs   Fuel
+BlockId.hs    BlockId, BlockEnv, BlockSet
+DFMonad.hs	      
 
-Figuring out proc-points
-~~~~~~~~~~~~~~~~~~~~~~~~
-Proc-points are identified by
-CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't
-that much code, JD thinks that it could be done much more nicely using
-a dominator analysis, using the Dataflow Engine.
 
 ----------------------------------------------------
       Top-level structure
@@ -146,12 +217,12 @@ cpsTop:
 
   * CmmProcPointZ.minimalProcPointSet
 	identify proc-points
+        no change to graph
 
   * CmmProcPointZ.addProcPointProtocols
 	something to do with the MA optimisation
         probably entirely unnecessary
 
-
   * Spill and reload:
      - CmmSpillReload.dualLivenessWithInsertion
        insert spills/reloads across 
@@ -170,7 +241,7 @@ cpsTop:
 
      - CmmStackLayout.layout
        Lay out the stack, returning an AreaMap
-         type AreaMap    = FiniteMap Area ByteOff
+         type AreaMap = FiniteMap Area ByteOff
           -- Byte offset of the oldest byte of the Area, 
           -- relative to the oldest byte of the Old Area
 
@@ -180,10 +251,53 @@ cpsTop:
    * Split into separate procedures
       - CmmProcPointZ.procPointAnalysis
         Given set of proc points, which blocks are reachable from each
+        Claim: too few proc-points => code duplication, but program still works??
 
       - CmmProcPointZ.splitAtProcPoints
 	Using this info, split into separate procedures
 
+      - CmmBuildInfoTables.setInfoTableStackMap
+	Attach stack maps to each info table
+
+
+----------------------------------------------------
+	Proc-points
+----------------------------------------------------
+
+Consider this program, which has a diamond control flow, 
+with a call on one branch
+ fn(p,x) {
+        h()
+	if b then { ... f(x) ...; q=5; goto J }
+             else { ...; q=7; goto J }
+     J: ..p...q...
+  }
+then the join point J is a "proc-point".  So, is 'p' passed to J
+as a parameter?  Or, if 'p' was saved on the stack anyway, perhaps
+to keep it alive across the call to h(), maybe 'p' gets communicated
+to J that way. This is an awkward choice.  (We think that we currently
+never pass variables to join points via arguments.)
+
+Furthermore, there is *no way* to pass q to J in a register (other
+than a paramter register).
+
+What we want is to do register allocation across the whole caboodle.
+Then we could drop all the code that deals with the above awkward
+decisions about spilling variables across proc-points.
+
+Note that J doesn't need an info table.
+
+What we really want is for each LastCall (not LastJump/Ret) 
+to have an info table.   Note that ProcPoints that are not successors
+of calls don't need an info table.
+
+Figuring out proc-points
+~~~~~~~~~~~~~~~~~~~~~~~~
+Proc-points are identified by
+CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't
+that much code, JD thinks that it could be done much more nicely using
+a dominator analysis, using the Dataflow Engine.
+
 ----------------------------------------------------
 		CAFs
 ----------------------------------------------------
@@ -193,11 +307,11 @@ cpsTop:
   If f is live, then so is g.  f's SRT must include g's closure.
 
 * The CLabel for the entry-point/closure reveals whether g is 
-  a CAF (or refers to CAFs).  See the IdLabell constructor of CLabel.
+  a CAF (or refers to CAFs).  See the IdLabel constructor of CLabel.
 
 * The CAF-ness of the original top-level defininions is figured out
   (by TidyPgm) before we generate C--.  This CafInfo is only set for
-  top-level Ids; nested bindings stay with NoCafRefs.
+  top-level Ids; nested bindings stay with MayHaveCafRefs.
 
 * Currently an SRT contains (only) pointers to (top-level) closures.
 
@@ -210,10 +324,19 @@ cpsTop:
   This generates C-- roughly like this:
      f_closure: .word f_entry
      f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... }
-     g_entry() [info-tbl-for-g] { ...jump h1 }
+     g_entry() [info-tbl-for-g] { ...jump h1... }
 
   Note that there is no top-level closure for g (only an info table).
-  So:   info-tbl-for-f must have an SRT that keeps h1,h2 alive
+  This fact (whether or not there is a top-level closure) is recorded
+  in the InfoTable attached to the CmmProc for f, g
+  INVARIANT: 
+     Any out-of-Group references to an IdLabel goes to
+     a Proc whose InfoTable says "I have a top-level closure".
+  Equivalently: 
+     A CmmProc whose InfoTable says "I do not have a top-level
+     closure" is referred to only from its own Group.
+
+* So:   info-tbl-for-f must have an SRT that keeps h1,h2 alive
         info-tbl-for-g must have an SRT that keeps h1 (only) alive
 
   But if we just look for the free CAF refs, we get:
@@ -228,6 +351,9 @@ cpsTop:
   the attaching, right at the end of the pipeline.  The C_SRT part
   gives offsets within a single, shared table of closure pointers.
 
+* DECIDED: we can generate SRTs based on the final Cmm program
+  without knowledge of how it is generated.
+
 ----------------------------------------------------
 		Foreign calls
 ----------------------------------------------------
diff --git a/compiler/main/HscMain.lhs b/compiler/main/HscMain.lhs
index a334c70..1f32c35 100644
--- a/compiler/main/HscMain.lhs
+++ b/compiler/main/HscMain.lhs
@@ -802,8 +802,8 @@ tryNewCodeGen hsc_env this_mod data_tycons imported_mods
 	; prog <- return $ map (runTx $ runCmmOpts cmmCfgOptsZ) prog
 		-- Control flow optimisation
 
-        -- Note: Have to thread the module's SRT through all the procedures
-        -- because we greedily build it as we go.
+        -- We are building a single SRT for the entire module, so
+        -- we must thread it through all the procedures as we cps-convert them.
         ; us <- mkSplitUniqSupply 'S'
         ; let topSRT = initUs_ us emptySRT
 	; (topSRT, prog) <- foldM (protoCmmCPSZ hsc_env) (topSRT, []) prog
-- 
1.7.10.4