Important performance wibble to callSiteInline (the n_vals_wanted > 0 thing)
See Note [Inlining in ArgCtxt]. This very small change gives quite a
big performance win. Just showing the bigger ones:
Program Size Allocs Runtime
--------------------------------------------------------------------------------
anna -0.7% -4.3% 0.15
cichelli -0.6% -6.4% 0.15
fulsom -0.4% -18.5% -8.1%
gcd -0.6% -12.0% 0.06
integer -0.6% -16.2% -8.4%
power -0.7% -19.3% -4.8%
--------------------------------------------------------------------------------
Min -0.7% -19.3% -15.7%
Max -0.1% +0.1% +5.7%
Geometric Mean -0.6% -1.9% -4.3%
The original change was to improve a case that Roman found (see test
eyeball/inline1) but that seems to work ok now anyway.