rts/gmp/mpn/hppa/README

   1 This directory contains mpn functions for various HP PA-RISC chips.  Code
   2 that runs faster on the PA7100 and later implementations, is in the pa7100
   3 directory.
   4
   5 RELEVANT OPTIMIZATION ISSUES
   6
   7   Load and Store timing
   8
   9 On the PA7000 no memory instructions can issue the two cycles after a store.
  10 For the PA7100, this is reduced to one cycle.
  11
  12 The PA7100 has a lookup-free cache, so it helps to schedule loads and the
  13 dependent instruction really far from each other.
  14
  15 STATUS
  16
  17 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
  18    instructions below (but some sw pipelining is needed to avoid the
  19    xmpyu-fstds delay):
  20
  21         fldds   s1_ptr
  22
  23         xmpyu
  24         fstds   N(%r30)
  25         xmpyu
  26         fstds   N(%r30)
  27
  28         ldws    N(%r30)
  29         ldws    N(%r30)
  30         ldws    N(%r30)
  31         ldws    N(%r30)
  32
  33         addc
  34         stws    res_ptr
  35         addc
  36         stws    res_ptr
  37
  38         addib   Loop
  39
  40 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
  41    (asymptotically) on the PA7100, using the instructions below.  With proper
  42    sw pipelining and the unrolling level below, the speed becomes 8
  43    cycles/limb.
  44
  45         fldds   s1_ptr
  46         fldds   s1_ptr
  47
  48         xmpyu
  49         fstds   N(%r30)
  50         xmpyu
  51         fstds   N(%r30)
  52         xmpyu
  53         fstds   N(%r30)
  54         xmpyu
  55         fstds   N(%r30)
  56
  57         ldws    N(%r30)
  58         ldws    N(%r30)
  59         ldws    N(%r30)
  60         ldws    N(%r30)
  61         ldws    N(%r30)
  62         ldws    N(%r30)
  63         ldws    N(%r30)
  64         ldws    N(%r30)
  65         addc
  66         addc
  67         addc
  68         addc
  69         addc    %r0,%r0,cy-limb
  70
  71         ldws    res_ptr
  72         ldws    res_ptr
  73         ldws    res_ptr
  74         ldws    res_ptr
  75         add
  76         stws    res_ptr
  77         addc
  78         stws    res_ptr
  79         addc
  80         stws    res_ptr
  81         addc
  82         stws    res_ptr
  83
  84         addib
  85
  86 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
  87    support emerges.  But we want to use 64-bit operations whenever possible,
  88    in particular for loads and stores.  It is possible to handle mpn_add_n
  89    efficiently by rotating (when s1/s2 are aligned), masking+bit field
  90    inserting when (they are not).  The speed should double compared to the
  91    code used today.