### Abstract

The demand for dense matrix computation in large scale and complex simulations is increasing; however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices; thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

Original language | English |
---|---|

Title of host publication | Supercomputing Frontiers - 4th Asian Conference, SCFA 2018, Proceedings |

Editors | Rio Yokota, Weigang Wu |

Publisher | Springer Verlag |

Pages | 274-292 |

Number of pages | 19 |

ISBN (Print) | 9783319699523 |

DOIs | |

Publication status | Published - Jan 1 2018 |

Event | 4th Asian Conference on Supercomputing Frontiers, SCFA 2018 - Singapore, Singapore Duration: Mar 26 2018 → Mar 29 2018 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 10776 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Conference

Conference | 4th Asian Conference on Supercomputing Frontiers, SCFA 2018 |
---|---|

Country | Singapore |

City | Singapore |

Period | 3/26/18 → 3/29/18 |

