### Abstract

Dense-matrix-vector multiplication is one of the well-known important matrix calculations. This calculation is provided a general matrix-vector multiplication (GEMV) function in the basic linear algebra subprograms (BLAS) libraries for several computation hardware. Traditionally, studies focus one large dense-matrix (the length of each side of the dense matrix is long)-vector multiplication. However, some applications require acceleration of numerous small dense-matrix-vector multiplications. This feature is provided by batched BLAS libraries. This calculation is also needed to compute a hierarchical-matrix-vector multiplication. In this study, we implemented numerous small dense-matrix-vector multiplications on a Pascal GPU and evaluated the performance. Thus, we considered the impact of optimization parameters and succeeded in obtaining a better performance than previous works. The maximum differences from our previous work is 28.47% and from batched GEMV of MAGMA BLAS is upto 81.81%. Moreover, we considered the use of two optimization parameters in one GPU kernel; one parameter was applied to some matrices, whereas the second parameter was applied to other matrices. The amount of the improvement was limited (upto 5%), a performance improvement was achieved. Our result will serve as a good reference for users who need to use numerous small dense-matrix-vector multiplications on a GPU and want to optimize a matrix-vector multiplication by hand-Tuning and auto-Tuning.

Original language | English |
---|---|

Title of host publication | Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 9-16 |

Number of pages | 8 |

ISBN (Electronic) | 9781728148823 |

DOIs | |

Publication status | Published - Oct 2019 |

Event | 13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 - Singapore, Singapore Duration: Oct 1 2019 → Oct 4 2019 |

### Publication series

Name | Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 |
---|

### Conference

Conference | 13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 |
---|---|

Country | Singapore |

City | Singapore |

Period | 10/1/19 → 10/4/19 |

### All Science Journal Classification (ASJC) codes

- Computer Networks and Communications
- Hardware and Architecture
- Electrical and Electronic Engineering
- Control and Optimization

## Fingerprint Dive into the research topics of 'Optimization of numerous small dense-matrix-vector multiplications in h-matrix arithmetic on gpu'. Together they form a unique fingerprint.

## Cite this

*Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019*(pp. 9-16). [8906754] (Proceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/MCSoC.2019.00009