X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=compiler%2Fcmm%2Fcmm-notes;h=98c2e836994c8e05990c47205af7de371a4d2ff2;hp=0852711f96dde654736a19cd213f6d0e5b7edfd9;hb=d25676a6b1c42495702048b6ca6f26ebd15205d8;hpb=889c084e943779e76d19f2ef5e970ff655f511eb

diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes
index 0852711..98c2e83 100644
--- a/compiler/cmm/cmm-notes
+++ b/compiler/cmm/cmm-notes
@@ -1,3 +1,45 @@
+More notes (June 11)
+~~~~~~~~~~~~~~~~~~~~
+* Kill dead code assignArguments, argumentsSize in CmmCallConv.
+  Bake in ByteOff to ParamLocation and ArgumentFormat
+  CmmActuals -> [CmmActual]  similary CmmFormals
+
+* Possible refactoring: Nuke AGraph in favour of 
+      mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph
+  or even
+      mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph
+  (Remmber that the .cmm file parser must use this function)
+
+  or parameterise FCode over its envt; the CgState part seem useful for both
+
+* Move top and tail calls to runCmmContFlowOpts from HscMain to CmmCps.cpsTop
+  (and rename the latter!)
+
+* "Remove redundant reloads" in CmmSpillReload should be redundant; since
+  insertLateReloads is now gone, every reload is reloading a live variable.
+  Test and nuke.
+
+* Sink and inline S(RegSlot(x)) = e in precisely the same way that we
+  sink and inline x = e
+
+* Stack layout is very like register assignment: find non-conflicting assigments.
+  In particular we can use colouring or linear scan (etc).
+
+  We'd fine-grain interference (on a word by word basis) to get maximum overlap.
+  But that may make very big interference graphs.  So linear scan might be
+  more attactive.
+
+  NB: linear scan does on-the-fly live range splitting.
+
+* When stubbing dead slots be careful not to write into an area that
+  overlaps with an area that's in use.  So stubbing needs to *follow* 
+  stack layout.
+
+
+More notes (May 11)
+~~~~~~~~~~~~~~~~~~~
+In CmmNode, consider spliting CmmCall into two: call and jump
+
 Notes on new codegen (Aug 10)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -15,14 +57,11 @@ Things to do:
 	This will fix the spill before stack check problem but only really as a side
 	effect. A 'real fix' probably requires making the spiller know about sp checks.
 
- - There is some silly stuff happening with the Sp. We end up with code like:
-   Sp = Sp + 8; R1 = _vwf::I64; Sp = Sp -8
-	Seems to be perhaps caused by the issue above but also maybe a optimisation
-	pass needed?
+   EZY: I don't understand this comment. David Terei, can you clarify?
 
- - Proc pass all arguments on the stack, adding more code and slowing down things
-   a lot. We either need to fix this or even better would be to get rid of
-	proc points.
+ - Proc points pass all arguments on the stack, adding more code and
+   slowing down things a lot. We either need to fix this or even better
+   would be to get rid of proc points.
 
  - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to
    Old.Cmm. We should abstract it to work on both representations, it needs only to
@@ -32,7 +71,7 @@ Things to do:
    we could convert codeGen/StgCmm* clients to the Hoopl's semantics?
    It's all deeply unsatisfactory.
 
- - Improve preformance of Hoopl.
+ - Improve performance of Hoopl.
 
    A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters
    (using the same ghc-cmm branch +libraries compiled by the old codegenerator)
@@ -50,6 +89,9 @@ Things to do:
 
    So we generate a bit better code, but it takes us longer!
 
+   EZY: Also importantly, Hoopl uses dramatically more memory than the
+   old code generator.
+
  - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could
    splice blocks instead?
 
@@ -57,7 +99,7 @@ Things to do:
    a block catenation function would be probably nicer than blockToNodeList
    / blockOfNodeList combo.
 
- - loweSafeForeignCall seems too lowlevel. Just use Dataflow. After that
+ - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that
    delete splitEntrySeq from HooplUtils.
 
  - manifestSP seems to touch a lot of the graph representation. It is
@@ -76,6 +118,9 @@ Things to do:
    calling convention, and the code for calling foreign calls is generated
 
  - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline
+   EZY (2011-04-16): The mini-inliner has been generalized and ported,
+   but the constant folding and other optimizations need to still be
+   ported.
 
  - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);
    we ultimately want to share this with the Cmm branch eliminator.
@@ -113,7 +158,7 @@ Things to do:
  - See "CAFs" below; we want to totally refactor the way SRTs are calculated
 
  - Pull out Areas into its own module
-   Parameterise AreaMap
+   Parameterise AreaMap (note there are type synonyms in CmmStackLayout!)
    Add ByteWidth = Int
    type SubArea    = (Area, ByteOff, ByteWidth) 
    ByteOff should not be defined in SMRep -- that is too high up the hierarchy
@@ -293,8 +338,8 @@ cpsTop:
        insert spills/reloads across 
 	   LastCalls, and 
 	   Branches to proc-points
-     Now sink those reloads:
-     - CmmSpillReload.insertLateReloads
+     Now sink those reloads (and other instructions):
+     - CmmSpillReload.rewriteAssignments
      - CmmSpillReload.removeDeadAssignmentsAndReloads
 
   * CmmStackLayout.stubSlotsOnDeath
@@ -344,7 +389,7 @@ to J that way. This is an awkward choice.  (We think that we currently
 never pass variables to join points via arguments.)
 
 Furthermore, there is *no way* to pass q to J in a register (other
-than a paramter register).
+than a parameter register).
 
 What we want is to do register allocation across the whole caboodle.
 Then we could drop all the code that deals with the above awkward