The (poorly optimized) code in this directory was originally written for a
j90 system, but finished on a c90.  It should work on all Cray vector
computers.  For the T3E and T3D systems, the `alpha' subdirectory at the
same level as the directory containing this file, is much better.

* `+' seems to be faster than `|' when combining carries.

* It is possible that the best multiply performance would be achived by
  storing only 24 bits per element, and using lazy carry propagation.  Before
  calling i24mult, full carry propagation would be needed.

* Supply tasking versions of the C loops.