Important performance wibble to callSiteInline (the n_vals_wanted > 0 thing)
authorsimonpj@microsoft.com <unknown>
Tue, 9 Sep 2008 15:50:11 +0000 (15:50 +0000)
committersimonpj@microsoft.com <unknown>
Tue, 9 Sep 2008 15:50:11 +0000 (15:50 +0000)
See Note [Inlining in ArgCtxt].  This very small change gives quite a
big performance win. Just showing the bigger ones:

        Program           Size    Allocs   Runtime
--------------------------------------------------------------------------------
           anna          -0.7%     -4.3%      0.15
       cichelli          -0.6%     -6.4%      0.15
         fulsom          -0.4%    -18.5%     -8.1%
            gcd          -0.6%    -12.0%      0.06
        integer          -0.6%    -16.2%     -8.4%
          power          -0.7%    -19.3%     -4.8%
--------------------------------------------------------------------------------
            Min          -0.7%    -19.3%    -15.7%
            Max          -0.1%     +0.1%     +5.7%
 Geometric Mean          -0.6%     -1.9%     -4.3%

The original change was to improve a case that Roman found (see test
eyeball/inline1) but that seems to work ok now anyway.

compiler/coreSyn/CoreUnfold.lhs

index 5797cba..c630277 100644 (file)
@@ -595,8 +595,8 @@ callSiteInline dflags active_inline id lone_variable arg_infos cont_info
                        = case cont_info of
                            BoringCtxt -> not is_top && n_vals_wanted > 0       -- Note [Nested functions] 
                            CaseCtxt   -> not lone_variable || not is_value     -- Note [Lone variables]
-                           ArgCtxt {} -> True
-                               -- Was: n_vals_wanted > 0; but see test eyeball/inline1.hs
+                           ArgCtxt {} -> n_vals_wanted > 0 
+                               -- See Note [Inlining in ArgCtxt]
 
                    small_enough = (size - discount) <= opt_UF_UseThreshold
                    discount = computeDiscount n_vals_wanted arg_discounts 
@@ -640,6 +640,19 @@ branches.  Then inlining it doesn't increase allocation, but it does
 increase the chance that the constructor won't be allocated at all in
 the branches that don't use it.
 
+Note [Inlining in ArgCtxt]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+The condition (n_vals_wanted > 0) here is very important, because otherwise
+we end up inlining top-level stuff into useless places; eg
+   x = I# 3#
+   f = \y.  g x
+This can make a very big difference: it adds 16% to nofib 'integer' allocs,
+and 20% to 'power'.
+
+At one stage I replaced this condition by 'True' (leading to the above 
+slow-down).  The motivation was test eyeball/inline1.hs; but that seems
+to work ok now.
+
 Note [Lone variables]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The "lone-variable" case is important.  I spent ages messing about