The International Conference for High Performance Computing, Networking, Storage and Analysis
Designing and Auto-Tuning Parallel 3-D FFT for Computation-Communication Overlap.
Authors: Sukhyun Song (University of Maryland), Jeffrey K. Hollingsworth (University of Maryland)
Best Poster Finalist
Abstract: We present a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. Preliminary results show that we maximize computation-communication overlap, and execute 3-D FFT faster than the MPI-enabled FFTW library by up to 1.76x.