From e71d6d1f458685b6a20f6d02433667be1d4f7a26 Mon Sep 17 00:00:00 2001 From: "simonpj@microsoft.com" Date: Tue, 9 Sep 2008 15:50:11 +0000 Subject: [PATCH] Important performance wibble to callSiteInline (the n_vals_wanted > 0 thing) See Note [Inlining in ArgCtxt]. This very small change gives quite a big performance win. Just showing the bigger ones: Program Size Allocs Runtime -------------------------------------------------------------------------------- anna -0.7% -4.3% 0.15 cichelli -0.6% -6.4% 0.15 fulsom -0.4% -18.5% -8.1% gcd -0.6% -12.0% 0.06 integer -0.6% -16.2% -8.4% power -0.7% -19.3% -4.8% -------------------------------------------------------------------------------- Min -0.7% -19.3% -15.7% Max -0.1% +0.1% +5.7% Geometric Mean -0.6% -1.9% -4.3% The original change was to improve a case that Roman found (see test eyeball/inline1) but that seems to work ok now anyway. --- compiler/coreSyn/CoreUnfold.lhs | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/compiler/coreSyn/CoreUnfold.lhs b/compiler/coreSyn/CoreUnfold.lhs index 5797cba..c630277 100644 --- a/compiler/coreSyn/CoreUnfold.lhs +++ b/compiler/coreSyn/CoreUnfold.lhs @@ -595,8 +595,8 @@ callSiteInline dflags active_inline id lone_variable arg_infos cont_info = case cont_info of BoringCtxt -> not is_top && n_vals_wanted > 0 -- Note [Nested functions] CaseCtxt -> not lone_variable || not is_value -- Note [Lone variables] - ArgCtxt {} -> True - -- Was: n_vals_wanted > 0; but see test eyeball/inline1.hs + ArgCtxt {} -> n_vals_wanted > 0 + -- See Note [Inlining in ArgCtxt] small_enough = (size - discount) <= opt_UF_UseThreshold discount = computeDiscount n_vals_wanted arg_discounts @@ -640,6 +640,19 @@ branches. Then inlining it doesn't increase allocation, but it does increase the chance that the constructor won't be allocated at all in the branches that don't use it. +Note [Inlining in ArgCtxt] +~~~~~~~~~~~~~~~~~~~~~~~~~~ +The condition (n_vals_wanted > 0) here is very important, because otherwise +we end up inlining top-level stuff into useless places; eg + x = I# 3# + f = \y. g x +This can make a very big difference: it adds 16% to nofib 'integer' allocs, +and 20% to 'power'. + +At one stage I replaced this condition by 'True' (leading to the above +slow-down). The motivation was test eyeball/inline1.hs; but that seems +to work ok now. + Note [Lone variables] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "lone-variable" case is important. I spent ages messing about -- 1.7.10.4