1 This directory contains mpn functions for various HP PA-RISC chips. Code
2 that runs faster on the PA7100 and later implementations, is in the pa7100
5 RELEVANT OPTIMIZATION ISSUES
9 On the PA7000 no memory instructions can issue the two cycles after a store.
10 For the PA7100, this is reduced to one cycle.
12 The PA7100 has a lookup-free cache, so it helps to schedule loads and the
13 dependent instruction really far from each other.
17 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
18 instructions below (but some sw pipelining is needed to avoid the
40 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
41 (asymptotically) on the PA7100, using the instructions below. With proper
42 sw pipelining and the unrolling level below, the speed becomes 8
86 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
87 support emerges. But we want to use 64-bit operations whenever possible,
88 in particular for loads and stores. It is possible to handle mpn_add_n
89 efficiently by rotating (when s1/s2 are aligned), masking+bit field
90 inserting when (they are not). The speed should double compared to the