TY - GEN
T1 - Auto-tuning of hybrid MPI/OpenMP execution with code selection by ppOpen-AT
AU - Katagiri, Takahiro
AU - Matsumoto, Masaharu
AU - Ohshima, Satoshi
PY - 2016/7/18
Y1 - 2016/7/18
N2 - In this paper, we propose an effective kernel implementation for an application of the finite difference method (FDM) by merging computations of central-difference and explicit time expansion schemes without IF statements inside the loops. The effectiveness of the implementation depends on the CPU architecture and execution situation, such as the problem size and the number of MPI processes and OpenMP threads. We adopt auto-tuning (AT) technology to select the best implementation. The AT function for the selection, referred to as «code selection», is implemented in an AT language, namely, ppOpen-AT. The results of experiments conducted using current advanced CPUs (Xeon Phi, Ivy Bridge, and FX10) indicated that crucial speedups of conventional AT are achieved by code selection. In particular, the heaviest kernels achieved speedups of 4.21x (Xeon Phi), 2.52x (Ivy Bridge), and 2.03x (FX10).
AB - In this paper, we propose an effective kernel implementation for an application of the finite difference method (FDM) by merging computations of central-difference and explicit time expansion schemes without IF statements inside the loops. The effectiveness of the implementation depends on the CPU architecture and execution situation, such as the problem size and the number of MPI processes and OpenMP threads. We adopt auto-tuning (AT) technology to select the best implementation. The AT function for the selection, referred to as «code selection», is implemented in an AT language, namely, ppOpen-AT. The results of experiments conducted using current advanced CPUs (Xeon Phi, Ivy Bridge, and FX10) indicated that crucial speedups of conventional AT are achieved by code selection. In particular, the heaviest kernels achieved speedups of 4.21x (Xeon Phi), 2.52x (Ivy Bridge), and 2.03x (FX10).
UR - http://www.scopus.com/inward/record.url?scp=84991672371&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991672371&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2016.49
DO - 10.1109/IPDPSW.2016.49
M3 - Conference contribution
AN - SCOPUS:84991672371
T3 - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
SP - 1488
EP - 1495
BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
Y2 - 23 May 2016 through 27 May 2016
ER -