In decodeFloat_Int# we have the C-- code:
mp_tmp1 = Sp - WDS(1);
mp_tmp_w = Sp - WDS(2);
/* arguments: F1 = Float# */
arg = F1;
/* Perform the operation */
foreign "C" __decodeFloat_Int(mp_tmp1 "ptr", mp_tmp_w "ptr", arg) [];
/* returns: (Int# (mantissa), Int# (exponent)) */
RET_NN(W_[mp_tmp1], W_[mp_tmp_w]);
Which all looks quite reasonable. The problem is that RET_NN() might
assign the results to the stack (with an unregisterised back end), and
in this case the arguments to RET_NN() refer to the same stack slots
that will be assigned to.
The code generator should do the right thing here, but it wasn't - it
was assuming that it could assign the results sequentially. A 1-line
fix to use emitSimultaneously rather than emitStmts (plus comments).
emitRetUT args = do
tickyUnboxedTupleReturn (length args) -- TICK
(sp, stmts) <- pushUnboxedTuple 0 args
- emitStmts stmts
+ emitSimultaneously stmts -- NB. the args might overlap with the stack slots
+ -- or regs that we assign to, so better use
+ -- simultaneous assignments here (#3546)
when (sp /= 0) $ stmtC (CmmAssign spReg (cmmRegOffW spReg (-sp)))
stmtC (CmmJump (entryCode (CmmLoad (cmmRegOffW spReg sp) bWord)) [])
-- TODO (when using CPS): emitStmt (CmmReturn (map snd args))