+++ /dev/null
-This directory contains mpn functions for various HP PA-RISC chips. Code
-that runs faster on the PA7100 and later implementations, is in the pa7100
-directory.
-
-RELEVANT OPTIMIZATION ISSUES
-
- Load and Store timing
-
-On the PA7000 no memory instructions can issue the two cycles after a store.
-For the PA7100, this is reduced to one cycle.
-
-The PA7100 has a lookup-free cache, so it helps to schedule loads and the
-dependent instruction really far from each other.
-
-STATUS
-
-1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
- instructions below (but some sw pipelining is needed to avoid the
- xmpyu-fstds delay):
-
- fldds s1_ptr
-
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
-
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
-
- addc
- stws res_ptr
- addc
- stws res_ptr
-
- addib Loop
-
-2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
- (asymptotically) on the PA7100, using the instructions below. With proper
- sw pipelining and the unrolling level below, the speed becomes 8
- cycles/limb.
-
- fldds s1_ptr
- fldds s1_ptr
-
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
- xmpyu
- fstds N(%r30)
-
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- ldws N(%r30)
- addc
- addc
- addc
- addc
- addc %r0,%r0,cy-limb
-
- ldws res_ptr
- ldws res_ptr
- ldws res_ptr
- ldws res_ptr
- add
- stws res_ptr
- addc
- stws res_ptr
- addc
- stws res_ptr
- addc
- stws res_ptr
-
- addib
-
-3. For the PA8000 we have to stick to using 32-bit limbs before compiler
- support emerges. But we want to use 64-bit operations whenever possible,
- in particular for loads and stores. It is possible to handle mpn_add_n
- efficiently by rotating (when s1/s2 are aligned), masking+bit field
- inserting when (they are not). The speed should double compared to the
- code used today.