TY - GEN
T1 - Performance characteristics of Graph500 on large-scale distributed environment
AU - Suzumura, Toyotaro
AU - Ueno, Koji
AU - Sato, Hitoshi
AU - Fujisawa, Katsuki
AU - Matsuoka, Satoshi
PY - 2011
Y1 - 2011
N2 - Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation "replicated-csr" based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.
AB - Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation "replicated-csr" based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.
UR - http://www.scopus.com/inward/record.url?scp=84856155189&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856155189&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2011.6114175
DO - 10.1109/IISWC.2011.6114175
M3 - Conference contribution
AN - SCOPUS:84856155189
SN - 9781457720642
T3 - Proceedings - 2011 IEEE International Symposium on Workload Characterization, IISWC - 2011
SP - 149
EP - 158
BT - Proceedings - 2011 IEEE International Symposium on Workload Characterization, IISWC - 2011
T2 - 2011 IEEE International Symposium on Workload Characterization, IISWC - 2011
Y2 - 6 November 2011 through 8 November 2011
ER -