Hybrid BFS approach using semi-external memory

Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.

Original languageEnglish
Title of host publicationProceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
PublisherIEEE Computer Society
Pages1698-1707
Number of pages10
ISBN (Electronic)9780769552088
DOIs
Publication statusPublished - Nov 27 2014
Event28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014 - Phoenix, United States
Duration: May 19 2014May 23 2014

Publication series

NameProceedings of the International Parallel and Distributed Processing Symposium, IPDPS
ISSN (Print)1530-2075
ISSN (Electronic)2332-1237

Other

Other28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014
CountryUnited States
CityPhoenix
Period5/19/145/23/14

Fingerprint

Dynamic random access storage
Data storage equipment
Degradation
Processing
Oils and fats
Data structures
Servers

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Iwabuchi, K., Sato, H., Mizote, R., Yasui, Y., Fujisawa, K., & Matsuoka, S. (2014). Hybrid BFS approach using semi-external memory. In Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014 (pp. 1698-1707). [6969580] (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS). IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2014.189

Hybrid BFS approach using semi-external memory. / Iwabuchi, Keita; Sato, Hitoshi; Mizote, Ryo; Yasui, Yuichiro; Fujisawa, Katsuki; Matsuoka, Satoshi.

Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society, 2014. p. 1698-1707 6969580 (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Iwabuchi, K, Sato, H, Mizote, R, Yasui, Y, Fujisawa, K & Matsuoka, S 2014, Hybrid BFS approach using semi-external memory. in Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014., 6969580, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, IEEE Computer Society, pp. 1698-1707, 28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014, Phoenix, United States, 5/19/14. https://doi.org/10.1109/IPDPSW.2014.189
Iwabuchi K, Sato H, Mizote R, Yasui Y, Fujisawa K, Matsuoka S. Hybrid BFS approach using semi-external memory. In Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society. 2014. p. 1698-1707. 6969580. (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS). https://doi.org/10.1109/IPDPSW.2014.189
Iwabuchi, Keita ; Sato, Hitoshi ; Mizote, Ryo ; Yasui, Yuichiro ; Fujisawa, Katsuki ; Matsuoka, Satoshi. / Hybrid BFS approach using semi-external memory. Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014. IEEE Computer Society, 2014. pp. 1698-1707 (Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS).
@inproceedings{d7135a3d6a0a445dbb6c7ea0123d95da,
title = "Hybrid BFS approach using semi-external memory",
abstract = "NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18{\%} performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.",
author = "Keita Iwabuchi and Hitoshi Sato and Ryo Mizote and Yuichiro Yasui and Katsuki Fujisawa and Satoshi Matsuoka",
year = "2014",
month = "11",
day = "27",
doi = "10.1109/IPDPSW.2014.189",
language = "English",
series = "Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS",
publisher = "IEEE Computer Society",
pages = "1698--1707",
booktitle = "Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014",
address = "United States",

}

TY - GEN

T1 - Hybrid BFS approach using semi-external memory

AU - Iwabuchi, Keita

AU - Sato, Hitoshi

AU - Mizote, Ryo

AU - Yasui, Yuichiro

AU - Fujisawa, Katsuki

AU - Matsuoka, Satoshi

PY - 2014/11/27

Y1 - 2014/11/27

N2 - NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.

AB - NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.

UR - http://www.scopus.com/inward/record.url?scp=84918791179&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84918791179&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2014.189

DO - 10.1109/IPDPSW.2014.189

M3 - Conference contribution

AN - SCOPUS:84918791179

T3 - Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

SP - 1698

EP - 1707

BT - Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014

PB - IEEE Computer Society

ER -