TY - GEN
T1 - Optimization of hierarchical matrix computation on GPU
AU - Ohshima, Satoshi
AU - Yamazaki, Ichitaro
AU - Ida, Akihiro
AU - Yokota, Rio
N1 - Funding Information:
Acknowledgements. This work was partially supported by JSPS KAKENHI Grant Number 17H01749, JST/CREST, German Priority Programme 1648 Software for Exas-cale Computing (SPPEXA-II), and “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan (Project ID: jh160041). Computations were primarily performed using the computer facilities at the Information Technology Center, The University of Tokyo (Reedbush), the Research Institute for Information Technology, Kyushu University (ITO), and JCAHPC (Oakforest-PACS).
Funding Information:
This work was partially supported by JSPS KAKENHI Grant Number 17H01749, JST/CREST, German Priority Programme 1648 Software for Exascale Computing (SPPEXA-II), and “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan (Project ID: jh160041). Computations were primarily performed using the computer facilities at the Information Technology Center, The University of Tokyo (Reedbush), the Research Institute for Information Technology, Kyushu University (ITO), and JCAHPC (Oakforest-PACS).
PY - 2018
Y1 - 2018
N2 - The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices; thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.
AB - The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices; thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.
UR - http://www.scopus.com/inward/record.url?scp=85044440202&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044440202&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-69953-0_16
DO - 10.1007/978-3-319-69953-0_16
M3 - Conference contribution
AN - SCOPUS:85044440202
SN - 9783319699523
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 274
EP - 292
BT - Supercomputing Frontiers - 4th Asian Conference, SCFA 2018, Proceedings
A2 - Yokota, Rio
A2 - Wu, Weigang
PB - Springer Verlag
T2 - 4th Asian Conference on Supercomputing Frontiers, SCFA 2018
Y2 - 26 March 2018 through 29 March 2018
ER -