Optimization of numerous small dense-matrix-vector multiplications in h-matrix arithmetic on gpu

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

    研究成果: 書籍/レポート タイプへの寄稿会議への寄与

    3 被引用数 (Scopus)

    抄録

    Dense-matrix-vector multiplication is one of the well-known important matrix calculations. This calculation is provided a general matrix-vector multiplication (GEMV) function in the basic linear algebra subprograms (BLAS) libraries for several computation hardware. Traditionally, studies focus one large dense-matrix (the length of each side of the dense matrix is long)-vector multiplication. However, some applications require acceleration of numerous small dense-matrix-vector multiplications. This feature is provided by batched BLAS libraries. This calculation is also needed to compute a hierarchical-matrix-vector multiplication. In this study, we implemented numerous small dense-matrix-vector multiplications on a Pascal GPU and evaluated the performance. Thus, we considered the impact of optimization parameters and succeeded in obtaining a better performance than previous works. The maximum differences from our previous work is 28.47% and from batched GEMV of MAGMA BLAS is upto 81.81%. Moreover, we considered the use of two optimization parameters in one GPU kernel; one parameter was applied to some matrices, whereas the second parameter was applied to other matrices. The amount of the improvement was limited (upto 5%), a performance improvement was achieved. Our result will serve as a good reference for users who need to use numerous small dense-matrix-vector multiplications on a GPU and want to optimize a matrix-vector multiplication by hand-Tuning and auto-Tuning.

    本文言語英語
    ホスト出版物のタイトルProceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019
    出版社Institute of Electrical and Electronics Engineers Inc.
    ページ9-16
    ページ数8
    ISBN(電子版)9781728148823
    DOI
    出版ステータス出版済み - 10月 2019
    イベント13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 - Singapore, シンガポール
    継続期間: 10月 1 201910月 4 2019

    出版物シリーズ

    名前Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019

    会議

    会議13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019
    国/地域シンガポール
    CitySingapore
    Period10/1/1910/4/19

    !!!All Science Journal Classification (ASJC) codes

    • コンピュータ ネットワークおよび通信
    • ハードウェアとアーキテクチャ
    • 電子工学および電気工学
    • 制御と最適化

    フィンガープリント

    「Optimization of numerous small dense-matrix-vector multiplications in h-matrix arithmetic on gpu」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

    引用スタイル