+++ /dev/null
-
- INTEL PENTIUM P5 MPN SUBROUTINES
-
-
-This directory contains mpn functions optimized for Intel Pentium (P5,P54)
-processors. The mmx subdirectory has code for Pentium with MMX (P55).
-
-
-STATUS
-
- cycles/limb
-
- mpn_add_n/sub_n 2.375
-
- mpn_copyi/copyd 1.0
-
- mpn_divrem_1 44.0
- mpn_mod_1 44.0
- mpn_divexact_by3 15.0
-
- mpn_l/rshift 5.375 normal (6.0 on P54)
- 1.875 special shift by 1 bit
-
- mpn_mul_1 13.0
- mpn_add/submul_1 14.0
-
- mpn_mul_basecase 14.2 cycles/crossproduct (approx)
-
- mpn_sqr_basecase 8 cycles/crossproduct (approx)
- or 15.5 cycles/triangleproduct (approx)
-
-Pentium MMX gets the following improvements
-
- mpn_l/rshift 1.75
-
-
-1. mpn_lshift and mpn_rshift run at about 6 cycles/limb on P5 and P54, but the
-documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
-or 5 cycles/limb asymptotically. The P55 runs them at the expected speed.
-
-2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop
-overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
-
-3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
-should. Intel documentation says a mul instruction is 10 cycles, but it
-measures 9 and the routines using it run with it as 9.
-
-
-
-RELEVANT OPTIMIZATION ISSUES
-
-1. Pentium doesn't allocate cache lines on writes, unlike most other modern
-processors. Since the functions in the mpn class do array writes, we have to
-handle allocating the destination cache lines by reading a word from it in the
-loops, to achieve the best performance.
-
-2. Pairing of memory operations requires that the two issued operations refer
-to different cache banks. The simplest way to insure this is to read/write
-two words from the same object. If we make operations on different objects,
-they might or might not be to the same cache bank.
-
-
-
-REFERENCES
-
-"Intel Architecture Optimization Manual", 1997, order number 242816. This
-is mostly about P5, the parts about P6 aren't relevant. Available on-line:
-
- http://download.intel.com/design/PentiumII/manuals/242816.htm
-
-
-
-----------------
-Local variables:
-mode: text
-fill-column: 76
-End: