Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations

Kenji Ono, Shuichi Chiba, Shunsuke Inoue, Kazuo Minami

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a naïve implementation on both the Intel and Fujitsu Sparc architectures.

Original languageEnglish
Title of host publicationHigh Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers
EditorsOsni Marques, Michel Dayde, Kengo Nakajima
PublisherSpringer Verlag
Pages192-205
Number of pages14
ISBN (Print)9783319173528
DOIs
Publication statusPublished - Jan 1 2015
Externally publishedYes
Event11th International Conference on High Performance Computing for Computational Science, VECPAR 2014 - Eugene, United States
Duration: Jun 30 2014Jul 3 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8969
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on High Performance Computing for Computational Science, VECPAR 2014
CountryUnited States
CityEugene
Period6/30/147/3/14

Fingerprint

Iterative Solver
Sparse matrix
Simulators
Boundary conditions
Iterative methods
Program processors
Masks
Computational fluid dynamics
Simulator
Data storage equipment
Compilation
Data Streams
Computational Fluid Dynamics
Cache
Mask
Branch
Compression
High Performance
Traffic
kernel

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Ono, K., Chiba, S., Inoue, S., & Minami, K. (2015). Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations. In O. Marques, M. Dayde, & K. Nakajima (Eds.), High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers (pp. 192-205). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8969). Springer Verlag. https://doi.org/10.1007/978-3-319-17353-5_17

Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations. / Ono, Kenji; Chiba, Shuichi; Inoue, Shunsuke; Minami, Kazuo.

High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers. ed. / Osni Marques; Michel Dayde; Kengo Nakajima. Springer Verlag, 2015. p. 192-205 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8969).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ono, K, Chiba, S, Inoue, S & Minami, K 2015, Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations. in O Marques, M Dayde & K Nakajima (eds), High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8969, Springer Verlag, pp. 192-205, 11th International Conference on High Performance Computing for Computational Science, VECPAR 2014, Eugene, United States, 6/30/14. https://doi.org/10.1007/978-3-319-17353-5_17
Ono K, Chiba S, Inoue S, Minami K. Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations. In Marques O, Dayde M, Nakajima K, editors, High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers. Springer Verlag. 2015. p. 192-205. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-17353-5_17
Ono, Kenji ; Chiba, Shuichi ; Inoue, Shunsuke ; Minami, Kazuo. / Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations. High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers. editor / Osni Marques ; Michel Dayde ; Kengo Nakajima. Springer Verlag, 2015. pp. 192-205 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c2c922d12c874159bf3052129bc77109,
title = "Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations",
abstract = "Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a na{\"i}ve implementation on both the Intel and Fujitsu Sparc architectures.",
author = "Kenji Ono and Shuichi Chiba and Shunsuke Inoue and Kazuo Minami",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-3-319-17353-5_17",
language = "English",
isbn = "9783319173528",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "192--205",
editor = "Osni Marques and Michel Dayde and Kengo Nakajima",
booktitle = "High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers",
address = "Germany",

}

TY - GEN

T1 - Low byte/flop implementation of iterative solver for sparse matrices derived from stencil computations

AU - Ono, Kenji

AU - Chiba, Shuichi

AU - Inoue, Shunsuke

AU - Minami, Kazuo

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a naïve implementation on both the Intel and Fujitsu Sparc architectures.

AB - Practical simulators require high-performance iterative methods and efficient boundary conditions, especially in the field of computational fluid dynamics. In this paper, we propose a novel bitrepresentation technique to enhance the performance of such simulators. The technique is applied to an iterative kernel implementation that treats various boundary conditions in a stencil computation on a structured grid system. This approach reduces traffic from the main memory to CPU, and effectively utilizes Single Instruction–Multiple Data (SIMD) stream units with cache because of the bit-representation and compression of matrix elements. The proposed implementation also replaces if-branch statements with mask operations using the bit expression. This promotes the optimization of code during compilation and runtime. To evaluate the performance of the proposed implementation, we employ the Red–Black SOR and BiCGstab algorithms. Experimental results show that the proposed approach is up to 3.5 times faster than a naïve implementation on both the Intel and Fujitsu Sparc architectures.

UR - http://www.scopus.com/inward/record.url?scp=84942627076&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942627076&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-17353-5_17

DO - 10.1007/978-3-319-17353-5_17

M3 - Conference contribution

SN - 9783319173528

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 192

EP - 205

BT - High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers

A2 - Marques, Osni

A2 - Dayde, Michel

A2 - Nakajima, Kengo

PB - Springer Verlag

ER -