ghc/runtime/gmp/TODO

   1 THINGS TO WORK ON
   2
   3 Note that many of these things mentioned here are already fixed in GMP 2.0.
   4
   5 * Improve speed for non-gcc compilers by defining umul_ppmm, udiv_qrnnd,
   6   etc, to call __umul_ppmm, __udiv_qrnnd.  A typical definition for
   7   umul_ppmm would be
   8   #define umul_ppmm(ph,pl,m0,m1) \
   9     {unsigned long __ph; (pl) = __umul_ppmm (&__ph, (m0), (m1)); (ph) = __ph;}
  10   In order to maintain just one version of longlong.h (gmp and gcc), this
  11   has to be done outside of longlong.h.
  12
  13 * Change mpn-routines to not deal with normalisation?
  14     mpn_add: Unchanged.
  15     mpn_sub: Remove normalization loop.  Does it assume normalised input?
  16     mpn_mul: Make it return most sign limb, to simplify normalisation.
  17              Karatsubas algorith will be greatly simplified if mpn_add and
  18              mpn_sub doesn't normalise their results.
  19     mpn_div: Still requires strict normalisation.
  20   Beware of problems with mpn_cmp (and similar), a larger size does not
  21   ensure that an operand is larger, since it may be "less normalised".
  22   Normalization has to be moved into mpz-functions.
  23
  24 Bennet Yee at CMU proposes:
  25 * mpz_{put,get}_raw for memory oriented I/O like other *_raw functions.
  26 * A function mpfatal that is called for exceptions.  The user may override
  27   the default definition.
  28
  29 * mout should group in 10-digit groups.
  30 * ASCII dependence?
  31 * Error reporting from I/O functions (linkoping)?
  32
  33 * Make all computation mpz_* functions return a signed int indicating if
  34   the result was zero, positive, or negative?
  35
  36 * Implement mpz_cmpabs, mpz_xor, mpz_to_double, mpz_to_si, mpz_lcm,
  37   mpz_dpb, mpz_ldb, various bit string operations like mpz_cntbits.  Also
  38   mpz_@_si for most @??
  39
  40 Brian Beuning proposes:
  41    1. An array of small primes
  42    3. A function to factor an MINT
  43    4. A routine to look for "small" divisors of an MINT
  44    5. A 'multiply mod n' routine based on Montgomery's algorithm.
  45
  46 Doug Lea proposes:
  47    1. A way to find out if an integer fits into a signed int, and if so, a
  48       way to convert it out.
  49    2. Similarly for double precision float conversion.
  50    3. A function to convert the ratio of two integers to a double.  This
  51       can be useful for mixed mode operations with integers, rationals, and
  52       doubles.
  53    5. Bit-setting, clearing, and testing operations, as in
  54            mpz_setbit(MP_INT* dest, MP_INT* src, unsigned long bit_number),
  55        and used, for example in
  56            mpz_setbit(x, x, 123)
  57        to directly set the 123rd bit of x.
  58        If these are supported, you don't first have to set up
  59        an otherwise unnecessary mpz holding a shifted value, then
  60        do an "or" operation.
  61
  62 Elliptic curve method descrition in the Chapter `Algorithms in Number
  63 Theory' in the Handbook of Theoretical Computer Science, Elsevier,
  64 Amsterdam, 1990.  Also in Carl Pomerance's lecture notes on Cryptology and
  65 Computational Number Theory, 1990.
  66
  67 * New function: mpq_get_ifstr (int_str, frac_str, base,
  68   precision_in_som_way, rational_number).  Convert RATIONAL_NUMBER to a
  69   string in BASE and put the integer part in INT_STR and the fraction part
  70   in FRAC_STR.  (This function would do a division of the numerator and the
  71   denominator.)
  72
  73 * Should mpz_powm* handle negative exponents?
  74
  75 * udiv_qrnnd: If the denominator is normalized, the n0 argument has very
  76   little effect on the quotient.  Maybe we can assume it is 0, and
  77   compensate at a later stage?
  78
  79 * Better sqrt: First calculate the reciprocal square root, then multiply by
  80   the operand to get the square root.  The reciprocal square root can be
  81   obtained through Newton-Raphson without division.  The iteration is x :=
  82   x*(3-a*x^2)/2, where a is the operand.
  83
  84 * Newton-Raphson using multiplication: We get twice as many correct digits
  85   in each iteration.  So if we square x(k) as part of the iteration, the
  86   result will have the leading digits in common with the entire result from
  87   iteration k-1.  A _mpn_mul_lowpart could implement this.
  88
  89 * Peter Montgomery: If 0 <= a, b < p < 2^31 and I want a modular product
  90   a*b modulo p and the long long type is unavailable, then I can write
  91
  92           typedef   signed long slong;
  93           typedef unsigned long ulong;
  94           slong a, b, p, quot, rem;
  95
  96           quot = (slong) (0.5 + (double)a * (double)b / (double)p);
  97           rem =  (slong)((ulong)a * (ulong)b - (ulong)p * (ulong)q);
  98           if (rem < 0} {rem += p; quot--;}
  99
 100 FFT:
 101 {
 102   * Multiplication could be done with Montgomery's method combined with
 103     the "three primes" method described in Lipson.  Maybe this would be
 104     faster than to Nussbaumer's method with 3 (simple) moduli?
 105
 106   * Maybe the modular tricks below are not needed: We are using very
 107     special numbers, Fermat numbers with a small base and a large exponent,
 108     and maybe it's possible to just subtract and add?
 109
 110   * Modify Nussbaumer's convolution algorithm, to use 3 words for each
 111     coefficient, calculating in 3 relatively prime moduli (e.g.
 112     0xffffffff, 0x100000000, and 0x7fff on a 32-bit computer).  Both all
 113     operations and CRR would be very fast with such numbers.
 114
 115   * Optimize the Shoenhage-Stassen multiplication algorithm.  Take
 116     advantage of the real valued input to save half of the operations and
 117     half of the memory.  Try recursive variants with large, optimized base
 118     cases.  Use recursive FFT with large base cases, since recursive FFT
 119     has better memory locality.  A normal FFT get 100% cache miss.
 120 }
 121
 122 * Speed modulo arithmetic, using Montgomery's method or my pre-invertion
 123   method.  In either case, special arithmetic calls would be needed,
 124   mpz_mmmul, mpz_mmadd, mpz_mmsub, plus some kind of initialization
 125   functions.
 126
 127 * mpz_powm* should not use division to reduce the result in the loop, but
 128   instead pre-compute the reciprocal of the MOD argument and do reduced_val
 129   = val-val*reciprocal(MOD)*MOD, or use Montgomery's method.
 130
 131 * mpz_mod_2expplussi -- to reduce a bignum modulo (2**n)+s
 132
 133 * It would be a quite important feature never to allocate more memory than
 134   really necessary for a result.  Sometimes we can achieve this cheaply, by
 135   deferring reallocation until the result size is known.
 136
 137 * New macro in longlong.h: shift_rhl that extracts a word by shifting two
 138   words as a unit.  (Supported by i386, i860, HP-PA, RS6000, 29k.)  Useful
 139   for shifting multiple precision numbers.
 140
 141 * The installation procedure should make a test run of multiplication to
 142   decide the threshold values for algorithm switching between the available
 143   methods.
 144
 145 * The gcd algorithm could probably be improved with a divide-and-conquer
 146   (DAC) approach.  At least the bulk of the operations should be done with
 147   single precision.
 148
 149 * Fast output conversion of x to base B:
 150     1. Find n, such that (B^n > x).
 151     2. Set y to (x*2^m)/(B^n), where m large enough to make 2^n ~~ B^n
 152     3. Multiply the low half of y by B^(n/2), and recursively convert the
 153        result.  Truncate the low half of y and convert that recursively.
 154   Complexity: O(M(n)log(n))+O(D(n))!
 155
 156 * Extensions for floating-point arithmetic.
 157
 158 * Improve special cases for division.
 159
 160   1. When the divisor is just one word, normalization is not needed for
 161   most CPUs, and can be done in the division loop for CPUs that need
 162   normalization.
 163
 164   2. Even when the result is going to be very small, (i.e. nsize-dsize is
 165   small) normalization should also be done in the division loop.
 166
 167   To fix this, a new routine mpn_div_unnormalized is needed.
 168
 169 * Never allocate temporary space for a source param that overlaps with a
 170   destination param needing reallocation.  Instead malloc a new block for
 171   the destination (and free the source before returning to the caller).
 172
 173 * When any of the source operands overlap with the destination, mult (and
 174   other routines) slow down.  This is so because the need of temporary
 175   allocation (with alloca) and copying.  If a new destination were
 176   malloc'ed instead (and the overlapping source free'd before return) no
 177   copying would be needed.  Is GNU malloc quick enough to make this faster
 178   even for reasonably small operands?
 179 \f
 180 Local Variables:
 181 mode: text
 182 fill-column: 75
 183 version-control: never
 184 End: