Add RULES for realToFrac from Int.
{-# RULES
"realToFrac/Int->Double" realToFrac = int2Double
"realToFrac/Int->Float" realToFrac = int2Float
#-}
Note that this only matters for realToFrac. If you've been using
fromIntegral to promote Int to Doubles, things should be fine as they are.
The following program, using stream fusion to eliminate arrays:
import Data.Array.Vector
n =
40000000
main = do
let c = replicateU n (2::Double)
a = mapU realToFrac (enumFromToU 0 (n-1) ) :: UArr Double
print (sumU (zipWithU (*) c a))
Yields this loop body without the RULE:
case $wtoRational sc_sY4 of ww_aM7 { (# ww1_aM9, ww2_aMa #) ->
case $wfromRat ww1_aM9 ww2_aMa of tpl_X1P { D# ipv_sW3 ->
Main.$s$wfold
(+# sc_sY4 1)
(+# wild_X1i 1)
(+## sc2_sY6 (*## 2.0 ipv_sW3))
And with the rule:
Main.$s$wfold
(+# sc_sXT 1)
(+# wild_X1h 1)
(+## sc2_sXV (*## 2.0 (int2Double# sc_sXT)))
The running time of the program goes from 120 seconds to 0.198 seconds
with the native backend, and 0.143 seconds with the C backend.
And just so I don't forget, here's the difference in resulting
assembly (x86_64), between the native code generator, and the
C backend.
-fasm
Main_zdszdwfold_info:
movq %rdi,%rax
cmpq $
40000000,%rax
jne .LcZK
jmp *(%rbp)
.LcZK:
cmpq $
39999999,%rsi
jg .LcZN
cvtsi2sdq %rsi,%xmm0
mulsd .LnZP(%rip),%xmm0
movsd %xmm5,%xmm7
addsd %xmm0,%xmm7
incq %rax
incq %rsi
movq %rax,%rdi
movsd %xmm7,%xmm5
jmp Main_zdszdwfold_info
With the C backend we get the even better assembly, (-fvia-C -optc-O3)
Main_zdszdwfold_info:
cmpq $
40000000, %rdi
je .L9
.L5:
cmpq $
39999999, %rsi
jg .L9
cvtsi2sdq %rsi, %xmm0
leaq 1(%rdi), %rdi
addq $1, %rsi
addsd %xmm0, %xmm0
addsd %xmm0, %xmm5
jmp Main_zdszdwfold_info
.L9:
jmp *(%rbp)
So might make a useful test once the native codegen project starts up.