TY - GEN
T1 - Implementation and evaluation of Fock matrix calculation program on the Cell processor
AU - Honda, Hiroaki
AU - Hayashi, Tetsuo
AU - Inadomi, Yuichi
AU - Inoue, Koji
AU - Murakami, Kazuaki J.
PY - 2007
Y1 - 2007
N2 - Various processor architectures have been proposed until today, and the performance has improved remarkably. Recently, the Chip Multi-processors (CMPs), which has many processor cores onto a chip, are proposed for further performance improvement. The Cell processor is one of such CMP and shows high computational performance. Although this processor is designed for the multimedia, that high performance character can be utilized to molecular orbital calculation. In this study we implemented Fock matrix construction program on the Cell processor, and evaluated computational performance. As a result, there were two kinds of main stalls by the branch prediction and the data alignment, which are controlled by software mechanism for the simplification of the Cell processor hardware. It is possible to improve the performance about 30%, if the branch prediction hit ratio could be improved to 99%. For data alignment stall, a part of stalls, which is originated by data shuffle pipeline, could be decreased by preparing hardware data alignment mechanism.
AB - Various processor architectures have been proposed until today, and the performance has improved remarkably. Recently, the Chip Multi-processors (CMPs), which has many processor cores onto a chip, are proposed for further performance improvement. The Cell processor is one of such CMP and shows high computational performance. Although this processor is designed for the multimedia, that high performance character can be utilized to molecular orbital calculation. In this study we implemented Fock matrix construction program on the Cell processor, and evaluated computational performance. As a result, there were two kinds of main stalls by the branch prediction and the data alignment, which are controlled by software mechanism for the simplification of the Cell processor hardware. It is possible to improve the performance about 30%, if the branch prediction hit ratio could be improved to 99%. For data alignment stall, a part of stalls, which is originated by data shuffle pipeline, could be decreased by preparing hardware data alignment mechanism.
UR - http://www.scopus.com/inward/record.url?scp=71449123378&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=71449123378&partnerID=8YFLogxK
U2 - 10.1063/1.2836167
DO - 10.1063/1.2836167
M3 - Conference contribution
AN - SCOPUS:71449123378
SN - 9780735404786
T3 - AIP Conference Proceedings
SP - 64
EP - 67
BT - Computation in Modern Science and Engineering - Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007)
T2 - International Conference on Computational Methods in Science and Engineering 2007, ICCMSE 2007
Y2 - 25 September 2007 through 30 September 2007
ER -