TY - GEN
T1 - Performance optimization of SpMV using CRS format by considering OpenMP scheduling on CPUs and MIC
AU - Ohshima, Satoshi
AU - Katagiri, Takahiro
AU - Matsumoto, Masaharu
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/6
Y1 - 2014/11/6
N2 - In this study, we evaluate the performance of sparse matrix-vector multiplication (SpMV) using the compressed row storage (CRS) format on CPUs and MIC. We focus on the relationship between OpenMP scheduling and performance. The performance of SpMV is measured using various OpenMP scheduling settings and the results are analyzed, which show that OpenMP scheduling has a considerable effect on the performance of SpMV. We confirm that some scheduling settings resulted in performance improvements compared with default scheduling for particular matrices. The results of the evaluation show that the performance of SpMV is improved by up to 1.57 times compared with SPARC64 IXfx, 2.47 times compared with Xeon Ivy Bridge-EP, and 2.26 times compared with Knights Corner. Next, we modify the SpMV function of OpenATLib, an auto-tuned numerical library, to consider the scheduling of optimization as an additional SpMV implementation. We measure the performance of the GMRES solver and obtain performance improvements of up to 11.4%. These results will help to improve the performance of various numerical calculation applications.
AB - In this study, we evaluate the performance of sparse matrix-vector multiplication (SpMV) using the compressed row storage (CRS) format on CPUs and MIC. We focus on the relationship between OpenMP scheduling and performance. The performance of SpMV is measured using various OpenMP scheduling settings and the results are analyzed, which show that OpenMP scheduling has a considerable effect on the performance of SpMV. We confirm that some scheduling settings resulted in performance improvements compared with default scheduling for particular matrices. The results of the evaluation show that the performance of SpMV is improved by up to 1.57 times compared with SPARC64 IXfx, 2.47 times compared with Xeon Ivy Bridge-EP, and 2.26 times compared with Knights Corner. Next, we modify the SpMV function of OpenATLib, an auto-tuned numerical library, to consider the scheduling of optimization as an additional SpMV implementation. We measure the performance of the GMRES solver and obtain performance improvements of up to 11.4%. These results will help to improve the performance of various numerical calculation applications.
UR - http://www.scopus.com/inward/record.url?scp=84917732692&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84917732692&partnerID=8YFLogxK
U2 - 10.1109/MCSoC.2014.43
DO - 10.1109/MCSoC.2014.43
M3 - Conference contribution
AN - SCOPUS:84917732692
T3 - Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
SP - 253
EP - 260
BT - Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 8th IEEE International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014
Y2 - 23 September 2014 through 25 September 2014
ER -