swap <[]> and <{}> syntax

[ghc-hetmet.git] / compiler / cmm / cmm-notes
diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes

index 0852711..4a87911 100644 (file)
--- a/compiler/cmm/cmm-notes
+++ b/compiler/cmm/cmm-notes
@@ -1,3 +1,38 @@
+More notes (June 11)\r
+~~~~~~~~~~~~~~~~~~~~\r
+* Possible refactoring: Nuke AGraph in favour of \r
+      mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph\r
+  or even\r
+      mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph\r
+  (Remmber that the .cmm file parser must use this function)\r
+\r
+  or parameterise FCode over its envt; the CgState part seem useful for both\r
+\r
+* "Remove redundant reloads" in CmmSpillReload should be redundant; since\r
+  insertLateReloads is now gone, every reload is reloading a live variable.\r
+  Test and nuke.\r
+\r
+* Sink and inline S(RegSlot(x)) = e in precisely the same way that we\r
+  sink and inline x = e\r
+\r
+* Stack layout is very like register assignment: find non-conflicting assigments.\r
+  In particular we can use colouring or linear scan (etc).\r
+\r
+  We'd fine-grain interference (on a word by word basis) to get maximum overlap.\r
+  But that may make very big interference graphs.  So linear scan might be\r
+  more attactive.\r
+\r
+  NB: linear scan does on-the-fly live range splitting.\r
+\r
+* When stubbing dead slots be careful not to write into an area that\r
+  overlaps with an area that's in use.  So stubbing needs to *follow* \r
+  stack layout.\r
+\r
+\r
+More notes (May 11)\r
+~~~~~~~~~~~~~~~~~~~\r
+In CmmNode, consider spliting CmmCall into two: call and jump\r
+\r
  Notes on new codegen (Aug 10)\r
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
  \r
  Notes on new codegen (Aug 10)\r
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
  \r
@@ -15,14 +50,11 @@ Things to do:
         This will fix the spill before stack check problem but only really as a side\r
         effect. A 'real fix' probably requires making the spiller know about sp checks.\r
  \r
         This will fix the spill before stack check problem but only really as a side\r
         effect. A 'real fix' probably requires making the spiller know about sp checks.\r
  \r
- - There is some silly stuff happening with the Sp. We end up with code like:\r
-   Sp = Sp + 8; R1 = _vwf::I64; Sp = Sp -8\r
-       Seems to be perhaps caused by the issue above but also maybe a optimisation\r
-       pass needed?\r
+   EZY: I don't understand this comment. David Terei, can you clarify?\r
  \r
  \r
- - Proc pass all arguments on the stack, adding more code and slowing down things\r
-   a lot. We either need to fix this or even better would be to get rid of\r
-       proc points.\r
+ - Proc points pass all arguments on the stack, adding more code and\r
+   slowing down things a lot. We either need to fix this or even better\r
+   would be to get rid of proc points.\r
  \r
   - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to\r
     Old.Cmm. We should abstract it to work on both representations, it needs only to\r
  \r
   - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to\r
     Old.Cmm. We should abstract it to work on both representations, it needs only to\r
@@ -32,7 +64,7 @@ Things to do:
     we could convert codeGen/StgCmm* clients to the Hoopl's semantics?\r
     It's all deeply unsatisfactory.\r
  \r
     we could convert codeGen/StgCmm* clients to the Hoopl's semantics?\r
     It's all deeply unsatisfactory.\r
  \r
- - Improve preformance of Hoopl.\r
+ - Improve performance of Hoopl.\r
  \r
     A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters\r
     (using the same ghc-cmm branch +libraries compiled by the old codegenerator)\r
  \r
     A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters\r
     (using the same ghc-cmm branch +libraries compiled by the old codegenerator)\r
@@ -50,6 +82,9 @@ Things to do:
  \r
     So we generate a bit better code, but it takes us longer!\r
  \r
  \r
     So we generate a bit better code, but it takes us longer!\r
  \r
+   EZY: Also importantly, Hoopl uses dramatically more memory than the\r
+   old code generator.\r
+\r
   - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could\r
     splice blocks instead?\r
  \r
   - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could\r
     splice blocks instead?\r
  \r
@@ -57,7 +92,7 @@ Things to do:
     a block catenation function would be probably nicer than blockToNodeList\r
     / blockOfNodeList combo.\r
  \r
     a block catenation function would be probably nicer than blockToNodeList\r
     / blockOfNodeList combo.\r
  \r
- - loweSafeForeignCall seems too lowlevel. Just use Dataflow. After that\r
+ - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that\r
     delete splitEntrySeq from HooplUtils.\r
  \r
   - manifestSP seems to touch a lot of the graph representation. It is\r
     delete splitEntrySeq from HooplUtils.\r
  \r
   - manifestSP seems to touch a lot of the graph representation. It is\r
@@ -76,6 +111,9 @@ Things to do:
     calling convention, and the code for calling foreign calls is generated\r
  \r
   - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
     calling convention, and the code for calling foreign calls is generated\r
  \r
   - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
+   EZY (2011-04-16): The mini-inliner has been generalized and ported,\r
+   but the constant folding and other optimizations need to still be\r
+   ported.\r
  \r
   - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);\r
     we ultimately want to share this with the Cmm branch eliminator.\r
  \r
   - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);\r
     we ultimately want to share this with the Cmm branch eliminator.\r
@@ -113,7 +151,7 @@ Things to do:
   - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
  \r
   - Pull out Areas into its own module\r
   - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
  \r
   - Pull out Areas into its own module\r
-   Parameterise AreaMap\r
+   Parameterise AreaMap (note there are type synonyms in CmmStackLayout!)\r
     Add ByteWidth = Int\r
     type SubArea    = (Area, ByteOff, ByteWidth) \r
     ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
     Add ByteWidth = Int\r
     type SubArea    = (Area, ByteOff, ByteWidth) \r
     ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
@@ -210,7 +248,7 @@ CmmCvt.hs      Conversion between old and new Cmm reps
  CmmOpt.hs      Hopefully-redundant optimiser\r
  \r
  -------- Stuff to keep ------------\r
  CmmOpt.hs      Hopefully-redundant optimiser\r
  \r
  -------- Stuff to keep ------------\r
-CmmCPS.hs                 Driver for new pipeline\r
+CmmPipeline.hs            Driver for new pipeline\r
  \r
  CmmLive.hs                Liveness analysis, dead code elim\r
  CmmProcPoint.hs           Identifying and splitting out proc-points\r
  \r
  CmmLive.hs                Liveness analysis, dead code elim\r
  CmmProcPoint.hs           Identifying and splitting out proc-points\r
@@ -257,24 +295,24 @@ BlockId.hs          BlockId, BlockEnv, BlockSet
        type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
  \r
  * HscMain.tryNewCodeGen\r
        type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
  \r
  * HscMain.tryNewCodeGen\r
-    - STG->Cmm:    StgCmm.codeGen (new codegen)\r
-    - Optimise:    CmmContFlowOpt (simple optimisations, very self contained)\r
-    - Cps convert: CmmCPS.protoCmmCPS \r
-    - Optimise:    CmmContFlowOpt again\r
-    - Convert:     CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
+    - STG->Cmm:         StgCmm.codeGen (new codegen)\r
+    - Optimize and CPS: CmmPipeline.cmmPipeline\r
+    - Convert:          CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
  \r
  * StgCmm.hs  The new STG -> Cmm conversion code generator\r
    Lots of modules StgCmmXXX\r
  \r
  \r
  ----------------------------------------------------\r
  \r
  * StgCmm.hs  The new STG -> Cmm conversion code generator\r
    Lots of modules StgCmmXXX\r
  \r
  \r
  ----------------------------------------------------\r
-      CmmCPS.protoCmmCPS   The new pipeline\r
+      CmmPipeline.cmmPipeline   The new pipeline\r
  ----------------------------------------------------\r
  \r
  ----------------------------------------------------\r
  \r
-CmmCPS.protoCmmCPS:\r
-   1. Do cpsTop for each procedures separately\r
-   2. Build SRT representation; this spans multiple procedures\r
-       (unless split-objs)\r
+CmmPipeline.cmmPipeline:\r
+   1. Do control flow optimization\r
+   2. Do cpsTop for each procedures separately\r
+   3. Build SRT representation; this spans multiple procedures\r
+        (unless split-objs)\r
+   4. Do control flow optimization on all resulting procedures\r
  \r
  cpsTop:\r
    * CmmCommonBlockElim.elimCommonBlocks:\r
  \r
  cpsTop:\r
    * CmmCommonBlockElim.elimCommonBlocks:\r
@@ -293,8 +331,8 @@ cpsTop:
         insert spills/reloads across \r
            LastCalls, and \r
            Branches to proc-points\r
         insert spills/reloads across \r
            LastCalls, and \r
            Branches to proc-points\r
-     Now sink those reloads:\r
-     - CmmSpillReload.insertLateReloads\r
+     Now sink those reloads (and other instructions):\r
+     - CmmSpillReload.rewriteAssignments\r
       - CmmSpillReload.removeDeadAssignmentsAndReloads\r
  \r
    * CmmStackLayout.stubSlotsOnDeath\r
       - CmmSpillReload.removeDeadAssignmentsAndReloads\r
  \r
    * CmmStackLayout.stubSlotsOnDeath\r
@@ -344,7 +382,7 @@ to J that way. This is an awkward choice.  (We think that we currently
  never pass variables to join points via arguments.)\r
  \r
  Furthermore, there is *no way* to pass q to J in a register (other\r
  never pass variables to join points via arguments.)\r
  \r
  Furthermore, there is *no way* to pass q to J in a register (other\r
-than a paramter register).\r
+than a parameter register).\r
  \r
  What we want is to do register allocation across the whole caboodle.\r
  Then we could drop all the code that deals with the above awkward\r
  \r
  What we want is to do register allocation across the whole caboodle.\r
  Then we could drop all the code that deals with the above awkward\r
@@ -412,7 +450,7 @@ a dominator analysis, using the Dataflow Engine.
    f's keep-alive refs to include h1.\r
  \r
  * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
    f's keep-alive refs to include h1.\r
  \r
  * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
-  CmmInfoTable attached to each CmmProc.  CmmCPS.toTops actually does\r
+  CmmInfoTable attached to each CmmProc.  CmmPipeline.toTops actually does\r
    the attaching, right at the end of the pipeline.  The C_SRT part\r
    gives offsets within a single, shared table of closure pointers.\r
  \r
    the attaching, right at the end of the pipeline.  The C_SRT part\r
    gives offsets within a single, shared table of closure pointers.\r
  \r