X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=compiler%2Fcmm%2Fcmm-notes;h=98c2e836994c8e05990c47205af7de371a4d2ff2;hp=5ec489571f223ec50a6c8b817295d4d1b621587e;hb=d25676a6b1c42495702048b6ca6f26ebd15205d8;hpb=2bb3a439c106935d97fae7f7a0b60c21493d1bef

diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes
index 5ec4895..98c2e83 100644
--- a/compiler/cmm/cmm-notes
+++ b/compiler/cmm/cmm-notes
@@ -1,7 +1,146 @@
-Notes on new codegen (Sept 09)
+More notes (June 11)
+~~~~~~~~~~~~~~~~~~~~
+* Kill dead code assignArguments, argumentsSize in CmmCallConv.
+  Bake in ByteOff to ParamLocation and ArgumentFormat
+  CmmActuals -> [CmmActual]  similary CmmFormals
+
+* Possible refactoring: Nuke AGraph in favour of 
+      mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph
+  or even
+      mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph
+  (Remmber that the .cmm file parser must use this function)
+
+  or parameterise FCode over its envt; the CgState part seem useful for both
+
+* Move top and tail calls to runCmmContFlowOpts from HscMain to CmmCps.cpsTop
+  (and rename the latter!)
+
+* "Remove redundant reloads" in CmmSpillReload should be redundant; since
+  insertLateReloads is now gone, every reload is reloading a live variable.
+  Test and nuke.
+
+* Sink and inline S(RegSlot(x)) = e in precisely the same way that we
+  sink and inline x = e
+
+* Stack layout is very like register assignment: find non-conflicting assigments.
+  In particular we can use colouring or linear scan (etc).
+
+  We'd fine-grain interference (on a word by word basis) to get maximum overlap.
+  But that may make very big interference graphs.  So linear scan might be
+  more attactive.
+
+  NB: linear scan does on-the-fly live range splitting.
+
+* When stubbing dead slots be careful not to write into an area that
+  overlaps with an area that's in use.  So stubbing needs to *follow* 
+  stack layout.
+
+
+More notes (May 11)
+~~~~~~~~~~~~~~~~~~~
+In CmmNode, consider spliting CmmCall into two: call and jump
+
+Notes on new codegen (Aug 10)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Things to do:
+ - We insert spills for variables before the stack check! This is the reason for
+   some fishy code in StgCmmHeap.entryHeapCheck where we are doing some strange
+	things to fix up the stack pointer before GC calls/jumps.
+
+	The reason spills are inserted before the sp check is that at the entry to a
+	function we always store the parameters passed in registers to local variables.
+	The spill pass simply inserts spills at variable definitions. We instead should
+	sink the spills so that we can avoid spilling them on branches that never
+	reload them.
+
+	This will fix the spill before stack check problem but only really as a side
+	effect. A 'real fix' probably requires making the spiller know about sp checks.
+
+   EZY: I don't understand this comment. David Terei, can you clarify?
+
+ - Proc points pass all arguments on the stack, adding more code and
+   slowing down things a lot. We either need to fix this or even better
+   would be to get rid of proc points.
+
+ - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to
+   Old.Cmm. We should abstract it to work on both representations, it needs only to
+   convert a CmmInfoTable to [CmmStatic].
+
+ - The MkGraph currenty uses a different semantics for <*> than Hoopl. Maybe
+   we could convert codeGen/StgCmm* clients to the Hoopl's semantics?
+   It's all deeply unsatisfactory.
+
+ - Improve performance of Hoopl.
+
+   A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters
+   (using the same ghc-cmm branch +libraries compiled by the old codegenerator)
+   is at http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghchoopl.txt
+   - the code produced is 10.9% slower, the compilation is +118% slower!
+
+   The same comparison with ghc-head with zip representation is at
+   http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghczip.txt
+   - the code produced is 11.7% slower, the compilation is +78% slower.
+
+   When compiling nofib, ghc-cmm + libraries compiled with -fnew-codegen
+   is 23.7% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.hooplghcoldgen.txt).
+   When compiling nofib, ghc-head + libraries compiled with -fnew-codegen
+   is 31.4% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.zipghcoldgen.txt).
+
+   So we generate a bit better code, but it takes us longer!
+
+   EZY: Also importantly, Hoopl uses dramatically more memory than the
+   old code generator.
+
+ - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could
+   splice blocks instead?
+
+   In the CmmContFlowOpt.blockConcat, using Dataflow seems too clumsy. Still,
+   a block catenation function would be probably nicer than blockToNodeList
+   / blockOfNodeList combo.
+
+ - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that
+   delete splitEntrySeq from HooplUtils.
+
+ - manifestSP seems to touch a lot of the graph representation. It is
+   also slow for CmmSwitch nodes O(block_nodes * switch_statements).
+   Maybe rewrite manifestSP to use Dataflow?
+
+ - Sort out Label, LabelMap, LabelSet versus BlockId, BlockEnv, BlockSet
+   dichotomy. Mostly this means global replace, but we also need to make
+   Label an instance of Outputable (probably in the Outputable module).
+
+ - NB that CmmProcPoint line 283 has a hack that works around a GADT-related
+   bug in 6.10.
+
+ - SDM (2010-02-26) can we remove the Foreign constructor from Convention?
+   Reason: we never generate code for a function with the Foreign
+   calling convention, and the code for calling foreign calls is generated
+
+ - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline
+   EZY (2011-04-16): The mini-inliner has been generalized and ported,
+   but the constant folding and other optimizations need to still be
+   ported.
+
+ - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);
+   we ultimately want to share this with the Cmm branch eliminator.
+
+ - At the moment, references to global registers like Hp are "lowered" 
+   late (in CgUtils.fixStgRegisters). We should do this early, in the
+	new native codegen, much in the way that we lower calling conventions.
+	Might need to be a bit sophisticated about aliasing.
+
+ - Question: currently we lift procpoints to become separate
+   CmmProcs.  Do we still want to do this?
+    
+   NB: and advantage of continuing to do this is that
+   we can do common-proc elimination!
+
+ - Move to new Cmm rep:
+     * Make native CG consume New Cmm; 
+     * Convert Old Cmm->New Cmm to keep old path alive
+     * Produce New Cmm when reading in .cmm files
+
  - Consider module names
 
  - Top-level SRT threading is a bit ugly
@@ -18,22 +157,8 @@ Things to do:
 
  - See "CAFs" below; we want to totally refactor the way SRTs are calculated
 
- - Change  
-      type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)
-   to
-      type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph
-	-- And perhaps take opportunity to prune CmmInfo?
-
- - Clarify which fields of CmmInfo are still used
- - Maybe get rid of CmmFormals arg of CmmProc in all versions?
-
- - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check
-   And what does CmmBuildInfoTables do?!
-
- - Nuke CmmZipUtil, move zipPreds into ZipCfg
-
  - Pull out Areas into its own module
-   Parameterise AreaMap
+   Parameterise AreaMap (note there are type synonyms in CmmStackLayout!)
    Add ByteWidth = Int
    type SubArea    = (Area, ByteOff, ByteWidth) 
    ByteOff should not be defined in SMRep -- that is too high up the hierarchy
@@ -43,6 +168,9 @@ Things to do:
         -- rET_SMALL etc ==> CmmInfo
    Check that there are no other imports from codeGen in cmm/
 
+ - If you eliminate a label by branch chain elimination,
+   what happens if there's an Area associated with that label?
+
  - Think about a non-flattened representation?
 
  - LastCall: 
@@ -65,7 +193,7 @@ Things to do:
    http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline
 
 
- - We believe that all of CmmProcPointZ.addProcPointProtocols is dead.  What
+ - We believe that all of CmmProcPoint.addProcPointProtocols is dead.  What
    goes wrong if we simply never call it?
 
  - Something fishy in CmmStackLayout.hs
@@ -110,75 +238,57 @@ Things to do:
      move the whole splitting game into the C back end *only*
 	 (guided by the procpoint set)
 
-      
 ----------------------------------------------------
 	Modules in cmm/
 ----------------------------------------------------
 
--------- Dead stuff ------------
-CmmProcPoint        Dead: Michael Adams
-CmmCPS              Dead: Michael Adams
-CmmCPSGen.hs        Dead: Michael Adams
-CmmBrokenBlock.hs   Dead: Michael Adams
-CmmLive.hs          Dead: Michael Adams
-CmmProcPoint.hs     Dead: Michael Adams
-Dataflow.hs         Dead: Michael Adams
-StackColor.hs       Norman?
-StackPlacements.hs  Norman?
-
+-------- Testing stuff ------------
 HscMain.optionallyConvertAndOrCPS
         testCmmConversion
-DynFlags:  -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz
+DynFlags:  -fconvert-to-zipper-and-back, -frun-cpsz
 
 -------- Moribund stuff ------------
+OldCmm.hs      Definition of flowgraph of old representation
+OldCmmUtil.hs  Utilites that operates mostly on on CmmStmt
+OldPprCmm.hs   Pretty print for CmmStmt, GenBasicBlock and ListGraph
 CmmCvt.hs      Conversion between old and new Cmm reps
 CmmOpt.hs      Hopefully-redundant optimiser
-CmmZipUtil.hs  Only one function; move elsewhere
 
 -------- Stuff to keep ------------
-CmmCPSZ.hs	          Driver for new pipeline
+CmmCPS.hs                 Driver for new pipeline
 
-CmmLiveZ.hs	          Liveness analysis, dead code elim
-CmmProcPointZ.hs          Identifying and splitting out proc-points
+CmmLive.hs                Liveness analysis, dead code elim
+CmmProcPoint.hs           Identifying and splitting out proc-points
 
 CmmSpillReload.hs         Save and restore across calls
 
-CmmCommonBlockElimZ.hs    Common block elim
+CmmCommonBlockElim.hs     Common block elim
 CmmContFlowOpt.hs         Other optimisations (branch-chain, merging)
 
 CmmBuildInfoTables.hs     New info-table 
 CmmStackLayout.hs         and stack layout 
 CmmCallConv.hs
-CmmInfo.hs                Defn of InfoTables, and conversion to exact layout
+CmmInfo.hs                Defn of InfoTables, and conversion to exact byte layout
 
 ---------- Cmm data types --------------
-ZipCfgCmmRep.hs	    Cmm instantiations of dataflow graph framework
-MkZipCfgCmm.hs      Cmm instantiations of dataflow graph framework
+Cmm.hs              Cmm instantiations of dataflow graph framework
+MkGraph.hs          Interface for building Cmm for codeGen/Stg*.hs modules
+
+CmmDecl.hs          Shared Cmm types of both representations
+CmmExpr.hs          Type of Cmm expression
+CmmType.hs          Type of Cmm types and their widths
+CmmMachOp.hs        MachOp type and accompanying utilities
 
-Cmm.hs	      Key module; a mix of old and new stuff
-                  so needs tidying up in due course
-CmmExpr.hs
 CmmUtils.hs
 CmmLint.hs
 
 PprC.hs	            Pretty print Cmm in C syntax
-PprCmm.hs	    Pretty printer for Cmm
-PprCmmZ.hs	    Additional stuff for zipper rep
-
-CLabel.hs     CLabel
-
-----------  Dataflow modules --------------
-   Goal: separate library; for now, separate directory
-
-MkZipCfg.hs
-ZipCfg.hs
-ZipCfgExtras.hs
-ZipDataflow.hs
-CmmTx.hs	      Transactions
-OptimizationFuel.hs   Fuel
-BlockId.hs    BlockId, BlockEnv, BlockSet
-DFMonad.hs	      
+PprCmm.hs	    Pretty printer for CmmGraph.
+PprCmmDecl.hs       Pretty printer for common Cmm types.
+PprCmmExpr.hs       Pretty printer for Cmm expressions.
 
+CLabel.hs           CLabel
+BlockId.hs          BlockId, BlockEnv, BlockSet
 
 ----------------------------------------------------
       Top-level structure
@@ -194,7 +304,7 @@ DFMonad.hs
 * HscMain.tryNewCodeGen
     - STG->Cmm:    StgCmm.codeGen (new codegen)
     - Optimise:    CmmContFlowOpt (simple optimisations, very self contained)
-    - Cps convert: CmmCPSZ.protoCmmCPSZ 
+    - Cps convert: CmmCPS.protoCmmCPS 
     - Optimise:    CmmContFlowOpt again
     - Convert:     CmmCvt.cmmOfZgraph (convert to old rep) very self contained
 
@@ -203,23 +313,23 @@ DFMonad.hs
 
 
 ----------------------------------------------------
-      CmmCPSZ.protoCmmCPSZ   The new pipeline
+      CmmCPS.protoCmmCPS   The new pipeline
 ----------------------------------------------------
 
-CmmCPSZprotoCmmCPSZ:
+CmmCPS.protoCmmCPS:
    1. Do cpsTop for each procedures separately
    2. Build SRT representation; this spans multiple procedures
 	(unless split-objs)
 
 cpsTop:
-  * CmmCommonBlockElimZ.elimCommonBlocks:
+  * CmmCommonBlockElim.elimCommonBlocks:
 	eliminate common blocks 
 
-  * CmmProcPointZ.minimalProcPointSet
+  * CmmProcPoint.minimalProcPointSet
 	identify proc-points
         no change to graph
 
-  * CmmProcPointZ.addProcPointProtocols
+  * CmmProcPoint.addProcPointProtocols
 	something to do with the MA optimisation
         probably entirely unnecessary
 
@@ -228,8 +338,8 @@ cpsTop:
        insert spills/reloads across 
 	   LastCalls, and 
 	   Branches to proc-points
-     Now sink those reloads:
-     - CmmSpillReload.insertLateReloads
+     Now sink those reloads (and other instructions):
+     - CmmSpillReload.rewriteAssignments
      - CmmSpillReload.removeDeadAssignmentsAndReloads
 
   * CmmStackLayout.stubSlotsOnDeath
@@ -249,11 +359,11 @@ cpsTop:
        Manifest the stack pointer
 
    * Split into separate procedures
-      - CmmProcPointZ.procPointAnalysis
+      - CmmProcPoint.procPointAnalysis
         Given set of proc points, which blocks are reachable from each
         Claim: too few proc-points => code duplication, but program still works??
 
-      - CmmProcPointZ.splitAtProcPoints
+      - CmmProcPoint.splitAtProcPoints
 	Using this info, split into separate procedures
 
       - CmmBuildInfoTables.setInfoTableStackMap
@@ -279,7 +389,7 @@ to J that way. This is an awkward choice.  (We think that we currently
 never pass variables to join points via arguments.)
 
 Furthermore, there is *no way* to pass q to J in a register (other
-than a paramter register).
+than a parameter register).
 
 What we want is to do register allocation across the whole caboodle.
 Then we could drop all the code that deals with the above awkward
@@ -294,7 +404,7 @@ of calls don't need an info table.
 Figuring out proc-points
 ~~~~~~~~~~~~~~~~~~~~~~~~
 Proc-points are identified by
-CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't
+CmmProcPoint.minimalProcPointSet/extendPPSet Although there isn't
 that much code, JD thinks that it could be done much more nicely using
 a dominator analysis, using the Dataflow Engine.
 
@@ -347,7 +457,7 @@ a dominator analysis, using the Dataflow Engine.
   f's keep-alive refs to include h1.
 
 * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a
-  CmmInfoTable attached to each CmmProc.  CmmCPSZ.toTops actually does
+  CmmInfoTable attached to each CmmProc.  CmmCPS.toTops actually does
   the attaching, right at the end of the pipeline.  The C_SRT part
   gives offsets within a single, shared table of closure pointers.
 
@@ -358,7 +468,7 @@ a dominator analysis, using the Dataflow Engine.
 		Foreign calls
 ----------------------------------------------------
 
-See Note [Foreign calls] in ZipCfgCmmRep!  This explains that a safe
+See Note [Foreign calls] in CmmNode!  This explains that a safe
 foreign call must do this:
   save thread state
   push info table (on thread stack) to describe frame
@@ -393,7 +503,7 @@ NEW PLAN for foreign calls:
 		Cmm representations
 ----------------------------------------------------
 
-* Cmm.hs
+* CmmDecl.hs
      The type [GenCmm d h g] represents a whole module, 
 	** one list element per .o file **
 	Without SplitObjs, the list has exactly one element
@@ -408,7 +518,7 @@ NEW PLAN for foreign calls:
 
 
 -------------
-OLD BACK END representations (Cmm.hs):  
+OLD BACK END representations (OldCmm.hs):  
       type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)
 				-- A whole module
       newtype ListGraph i = ListGraph [GenBasicBlock i]
@@ -423,49 +533,47 @@ OLD BACK END representations (Cmm.hs):
   
 -------------
 NEW BACK END representations 
-* Not Cmm-specific at all
-    ZipCfg.hs defines  Graph, LGraph, FGraph,
-                       ZHead, ZTail, ZBlock ...
+* Uses Hoopl library, a zero-boot package
+* CmmNode defines a node of a flow graph.
+* Cmm defines CmmGraph, CmmTop, Cmm
+   - CmmGraph is a closed/closed graph + an entry node.
 
-              classes  LastNode, HavingSuccessors
+       data CmmGraph = CmmGraph { g_entry :: BlockId
+                                , g_graph :: Graph CmmNode C C }
 
-    MkZipCfg.hs: AGraph: building graphs
+   - CmmTop is a top level chunk, specialization of GenCmmTop from CmmDecl.hs
+       with CmmGraph as a flow graph.
+   - Cmm is a collection of CmmTops.
 
-* ZipCfgCmmRep: instantiates ZipCfg for Cmm
-      data Middle = ...CmmExpr...
-      data Last = ...CmmExpr...
-      type CmmGraph = Graph Middle Last
+       type Cmm          = GenCmm    CmmStatic CmmTopInfo CmmGraph
+       type CmmTop       = GenCmmTop CmmStatic CmmTopInfo CmmGraph
 
-      type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)
-      type CmmStackInfo = (ByteOff, Maybe ByteOff)
-                -- (SP offset on entry, update frame space = SP offset on exit)
-		-- The new codegen produces CmmZ, but once the stack is 
-		-- manifested we can drop that in favour of 
-		--    GenCmm CmmStatic CmmInfo CmmGraph
+   - CmmTop uses CmmTopInfo, which is a CmmInfoTable and CmmStackInfo
 
-      Inside a CmmProc:
-	   - CLabel: used
-	   - CmmInfo: partly used by NEW
-           - CmmFormals: not used at all  PERHAPS NOT EVEN BY OLD PIPELINE!
+       data CmmTopInfo   = TopInfo {info_tbl :: CmmInfoTable, stack_info :: CmmStackInfo}
 
-* MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep
-   Depends on (a) MkZipCfg (Cmm-independent)
-   	      (b) ZipCfgCmmRep (Cmm-specific)
+   - CmmStackInfo
 
--------------
-* SHARED stuff
-  CmmExpr.hs defines the Cmm expression types
-	- CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg
-	- CmmType, Width etc   (saparate module?)
-	- MachOp               (separate module?)
-	- Area, AreaId etc     (separate module?)
+       data CmmStackInfo = StackInfo {arg_space :: ByteOff, updfr_space :: Maybe ByteOff}
 
-  BlockId.hs defines  BlockId, BlockEnv, BlockSet
+         * arg_space = SP offset on entry
+         * updfr_space space = SP offset on exit
+       Once the staci is manifested, we could drom CmmStackInfo, ie. get
+         GenCmm CmmStatic CmmInfoTable CmmGraph, but we do not do that currently.
 
--------------
 
+* MkGraph.hs: smart constructors for Cmm.hs
+  Beware, the CmmAGraph defined here does not use AGraph from Hoopl,
+  as CmmAGraph can be opened or closed at exit, See the notes in that module.
+
+-------------
+* SHARED stuff
+  CmmDecl.hs - GenCmm and GenCmmTop types
+  CmmExpr.hs - defines the Cmm expression types
+             - CmmExpr, CmmReg, CmmLit, LocalReg, GlobalReg
+             - Area, AreaId etc     (separate module?)
+  CmmType.hs - CmmType, Width etc   (saparate module?)
+  CmmMachOp.hs - MachOp and CallishMachOp types
 
+  BlockId.hs defines  BlockId, BlockEnv, BlockSet
 -------------
-* Transactions indicate whether or not the result changes: CmmTx 
-     type Tx a = a -> TxRes a
-     data TxRes a = TxRes ChangeFlag a