Analyzing resource trade-offs in hardware overprovisioned supercomputers

Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel Ellsworth, Barry Rountree, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Hardware overprovisioned systems have recently been proposed as a viable alternative for a power-efficient design of next-generation supercomputers. A key challenge for such systems is to determine the degree of overprovisioning, which refers to the number of extra nodes that need to be installed under a given power constraint. In this paper, we first show that the degree of overprovisioning depends on dynamic parameters, such as the job mix as well as the global power constraint, and that static decisions can result in limited system throughput. We then study an exhaustive combination of adaptive resource management strategies that span three job scheduling algorithms, four power capping techniques, and three node boot-up mechanisms to understand the trade-off space involved. We then draw conclusions about how these strategies can adaptively control the degree of overprovisioning and analyze their impact on job throughput and power utilization.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages526-535
Number of pages10
ISBN (Print)9781538643686
DOIs
Publication statusPublished - Aug 3 2018
Externally publishedYes
Event32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 - Vancouver, Canada
Duration: May 21 2018May 25 2018

Publication series

NameProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

Other

Other32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018
CountryCanada
CityVancouver
Period5/21/185/25/18

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Cite this

Sakamoto, R., Patki, T., Cao, T., Kondo, M., Inoue, K., Ueda, M., ... Schulz, M. (2018). Analyzing resource trade-offs in hardware overprovisioned supercomputers. In Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018 (pp. 526-535). [8425206] (Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IPDPS.2018.00062