(MRR May 10, 2000) Files in the linpack directory linpackc.c The original LINPACK source from the Netlib website To compile it on a standard unix target: linpack_unix.c For some system configurations, the original source will not compile or link. This has been modified to compile on grinch, which does not have "floor" in its standard math library. To compile: gcc -lm -DSP -DUNROLL -O4 linpack_unix.c linpackev.c This has been modified to run on Excimer and Maximer. the modifications can be highlighted by running: diff linpackc.c linpackev.c makefile MetaWare makefile can be modified to choose any combination of options where DSP is for single precision, DDP is for double precision, DROLL is for iterative loops, and DUNROLL is for unrolled loops. This MetaWare version does not support AltiVec. makefile_gcc The GNU makefile for Altivec enhanced gcc. It supports the previous options as well as DALTIVEC, for an optimized LINPACK execution. ______________________________________________________________________________ Altivec Alignment method in linpackev.c ------------------------------------------------------------------------------ All three LINPACK functions modified to use AltiVec use the following methodology to align single-FP data-structures. The structure is broken into three segments: pre-aligned, vector operations, and post-aligned. The corresponding variables (pre_align,vops,post_align) should add upto the total size (n). The worst case data structure would start at 0x??4 and end at 0x??C, doing at most 6 non-vectored operations. Best case would use all Altivec instructions. The following is generic code leaving out the specific algorithm being implemented: xalign = alignment(dx); nxaligndiff = n-xalign; pre_align = (nxaligndiff < 0) ? n : xalign; if ( pre_align != 0) { for (i = 0; i < pre_align; i++) ... //pre-aligned method similar to non-Altivec method ... } vops = 4*(nxaligndiff/4); post_align = nxaligndiff % 4; if ( vops > 0 ) { ... //All iterated Altivec set-up operations ... for (i = xalign; i < vops; i += 4) ... //All AltiVec method operations ... } if ( post_align > 0) { for (i = n-post_align; i < n; i++) ... //post-aligned method similar to non-Altivec method ... } ______________________________________________________________________________ LINPACK idamax() method description for AltiVec ------------------------------------------------------------------------------ The LINPACK idamax() routine returns the index of the maximum of the absolute value of an array of single-floats. The AltiVec enhanced version uses the above alignment method, and the following describes one iteration of the middle Altivec section search method: Given: a max value, a test vector 1) take the absolute value of the vector (vec_abs) 2) are any in this vector greater than my given max? (vec_any_gt) a) No - go to next vector b) Yes 1)is the zeroth element less than or equal to all in the vector?(vec_splat_float and vec_all_le) a) Yes - this is the max because none of the others are bigger set zeroth element as max go to next vector b) No 1)is the first element less than or equal to all in the vector? (vec_splat_float and vec_all_le) a) Yes - this is the max because none of the others are bigger set first element as max go to next vector b) No 1)is the second element less than or equal to all in the vector? (vec_splat_float and vec_all_le) a) Yes - this is the max because none of the others are bigger set second element as max go to next vector b) No - the third is the max because thats the only option left set third element as max go to next vector Best case: decreasing value data Worst case: increasing value data average case: random data