Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed

Takeshi Nanri, Motoyoshi Kurokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

As the HPC systems increase their size, performance of collective communications is becoming an important issue. Usually, decisions for which algorithm of those communications to be used are done based on statically specified thresholds of the size of messages and the number of processes. However, on recent HPC systems that are hiring Fat Tree or Torus topology as their interconnect, the network speed has become unpredictable. The main reason is the effect of contentions. This effect depends heavily on the relative locations of the compute nodes. On the other hand, to reduce the number of idle nodes, there are attempts for building job schedulers to attach compute nodes flexibly, without considering their relative positions among each other. With this policy, the network performance becomes unstable. As an approach for finding an appropriate algorithm even on such environment, a dynamic method, STAR-MPI, has been proposed. This method examines each algorithm at runtime, and uses the empirical data to choose the suitable one for the given situation. This paper first examined the effect of STAR-MPI on an environment with unstable network speed. The results of experiments on this environment showed that the dynamic approach was effective, but the cost for testing slow algorithms limited the effect. Then, the authors proposed an enhancement, in which algorithms that have been predicted relatively slow were discarded from the list of candidates. The predictions were done by using the performance models of the algorithms with the latency and the bandwidth measured at the first call of the collective communication. At this point, the effect of this enhancement shown in experimental results was not significant. However, the results showed that there was a possibility for achieving better performance by using more cost-effective way of prediction and tuning thresholds and factors used in the enhancement.

Original languageEnglish
Title of host publicationProceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011
Pages693-698
Number of pages6
DOIs
Publication statusPublished - Sep 26 2011
Event2011 International Conference on High Performance Computing and Simulation, HPCS 2011 - Istanbul, Turkey
Duration: Jul 4 2011Jul 8 2011

Publication series

NameProceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011

Other

Other2011 International Conference on High Performance Computing and Simulation, HPCS 2011
CountryTurkey
CityIstanbul
Period7/4/117/8/11

Fingerprint

Dynamic Algorithms
Unstable
Communication
Collective Communication
Enhancement
Vertex of a graph
Prediction
Costs
Network Performance
Performance Model
Interconnect
Contention
Network performance
Oils and fats
Scheduler
Latency
Tuning
Torus
Choose
Bandwidth

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Modelling and Simulation

Cite this

Nanri, T., & Kurokawa, M. (2011). Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed. In Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011 (pp. 693-698). [5999894] (Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011). https://doi.org/10.1109/HPCSim.2011.5999894

Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed. / Nanri, Takeshi; Kurokawa, Motoyoshi.

Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011. 2011. p. 693-698 5999894 (Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nanri, T & Kurokawa, M 2011, Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed. in Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011., 5999894, Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011, pp. 693-698, 2011 International Conference on High Performance Computing and Simulation, HPCS 2011, Istanbul, Turkey, 7/4/11. https://doi.org/10.1109/HPCSim.2011.5999894
Nanri T, Kurokawa M. Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed. In Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011. 2011. p. 693-698. 5999894. (Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011). https://doi.org/10.1109/HPCSim.2011.5999894
Nanri, Takeshi ; Kurokawa, Motoyoshi. / Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed. Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011. 2011. pp. 693-698 (Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011).
@inproceedings{0efdff5dded847ebaabe8bcc04908988,
title = "Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed",
abstract = "As the HPC systems increase their size, performance of collective communications is becoming an important issue. Usually, decisions for which algorithm of those communications to be used are done based on statically specified thresholds of the size of messages and the number of processes. However, on recent HPC systems that are hiring Fat Tree or Torus topology as their interconnect, the network speed has become unpredictable. The main reason is the effect of contentions. This effect depends heavily on the relative locations of the compute nodes. On the other hand, to reduce the number of idle nodes, there are attempts for building job schedulers to attach compute nodes flexibly, without considering their relative positions among each other. With this policy, the network performance becomes unstable. As an approach for finding an appropriate algorithm even on such environment, a dynamic method, STAR-MPI, has been proposed. This method examines each algorithm at runtime, and uses the empirical data to choose the suitable one for the given situation. This paper first examined the effect of STAR-MPI on an environment with unstable network speed. The results of experiments on this environment showed that the dynamic approach was effective, but the cost for testing slow algorithms limited the effect. Then, the authors proposed an enhancement, in which algorithms that have been predicted relatively slow were discarded from the list of candidates. The predictions were done by using the performance models of the algorithms with the latency and the bandwidth measured at the first call of the collective communication. At this point, the effect of this enhancement shown in experimental results was not significant. However, the results showed that there was a possibility for achieving better performance by using more cost-effective way of prediction and tuning thresholds and factors used in the enhancement.",
author = "Takeshi Nanri and Motoyoshi Kurokawa",
year = "2011",
month = "9",
day = "26",
doi = "10.1109/HPCSim.2011.5999894",
language = "English",
isbn = "9781612843810",
series = "Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011",
pages = "693--698",
booktitle = "Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011",

}

TY - GEN

T1 - Effect of dynamic algorithm selection of Alltoall communication on environments with unstable network speed

AU - Nanri, Takeshi

AU - Kurokawa, Motoyoshi

PY - 2011/9/26

Y1 - 2011/9/26

N2 - As the HPC systems increase their size, performance of collective communications is becoming an important issue. Usually, decisions for which algorithm of those communications to be used are done based on statically specified thresholds of the size of messages and the number of processes. However, on recent HPC systems that are hiring Fat Tree or Torus topology as their interconnect, the network speed has become unpredictable. The main reason is the effect of contentions. This effect depends heavily on the relative locations of the compute nodes. On the other hand, to reduce the number of idle nodes, there are attempts for building job schedulers to attach compute nodes flexibly, without considering their relative positions among each other. With this policy, the network performance becomes unstable. As an approach for finding an appropriate algorithm even on such environment, a dynamic method, STAR-MPI, has been proposed. This method examines each algorithm at runtime, and uses the empirical data to choose the suitable one for the given situation. This paper first examined the effect of STAR-MPI on an environment with unstable network speed. The results of experiments on this environment showed that the dynamic approach was effective, but the cost for testing slow algorithms limited the effect. Then, the authors proposed an enhancement, in which algorithms that have been predicted relatively slow were discarded from the list of candidates. The predictions were done by using the performance models of the algorithms with the latency and the bandwidth measured at the first call of the collective communication. At this point, the effect of this enhancement shown in experimental results was not significant. However, the results showed that there was a possibility for achieving better performance by using more cost-effective way of prediction and tuning thresholds and factors used in the enhancement.

AB - As the HPC systems increase their size, performance of collective communications is becoming an important issue. Usually, decisions for which algorithm of those communications to be used are done based on statically specified thresholds of the size of messages and the number of processes. However, on recent HPC systems that are hiring Fat Tree or Torus topology as their interconnect, the network speed has become unpredictable. The main reason is the effect of contentions. This effect depends heavily on the relative locations of the compute nodes. On the other hand, to reduce the number of idle nodes, there are attempts for building job schedulers to attach compute nodes flexibly, without considering their relative positions among each other. With this policy, the network performance becomes unstable. As an approach for finding an appropriate algorithm even on such environment, a dynamic method, STAR-MPI, has been proposed. This method examines each algorithm at runtime, and uses the empirical data to choose the suitable one for the given situation. This paper first examined the effect of STAR-MPI on an environment with unstable network speed. The results of experiments on this environment showed that the dynamic approach was effective, but the cost for testing slow algorithms limited the effect. Then, the authors proposed an enhancement, in which algorithms that have been predicted relatively slow were discarded from the list of candidates. The predictions were done by using the performance models of the algorithms with the latency and the bandwidth measured at the first call of the collective communication. At this point, the effect of this enhancement shown in experimental results was not significant. However, the results showed that there was a possibility for achieving better performance by using more cost-effective way of prediction and tuning thresholds and factors used in the enhancement.

UR - http://www.scopus.com/inward/record.url?scp=80053021678&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053021678&partnerID=8YFLogxK

U2 - 10.1109/HPCSim.2011.5999894

DO - 10.1109/HPCSim.2011.5999894

M3 - Conference contribution

AN - SCOPUS:80053021678

SN - 9781612843810

T3 - Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011

SP - 693

EP - 698

BT - Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011

ER -