Effect of reordering internal messages in MPI broadcast according to the load imbalance

Takesi Soga, Takeshi Nanri, Motoyoshi Kurokawa, Kazuaki Murakami

研究成果: 著書/レポートタイプへの貢献会議での発言

抄録

To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.

元の言語英語
ホスト出版物のタイトルInnovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008
ページ11-16
ページ数6
DOI
出版物ステータス出版済み - 12 1 2008
イベント2008 International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, IWIA'08 - Hilo, HI, 米国
継続期間: 1 21 20081 23 2008

出版物シリーズ

名前Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems

その他

その他2008 International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, IWIA'08
米国
Hilo, HI
期間1/21/081/23/08

Fingerprint

Communication
Scalability
Costs
Experiments

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

これを引用

Soga, T., Nanri, T., Kurokawa, M., & Murakami, K. (2008). Effect of reordering internal messages in MPI broadcast according to the load imbalance. : Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008 (pp. 11-16). [5453548] (Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems). https://doi.org/10.1109/IWIA.2008.14

Effect of reordering internal messages in MPI broadcast according to the load imbalance. / Soga, Takesi; Nanri, Takeshi; Kurokawa, Motoyoshi; Murakami, Kazuaki.

Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008. 2008. p. 11-16 5453548 (Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems).

研究成果: 著書/レポートタイプへの貢献会議での発言

Soga, T, Nanri, T, Kurokawa, M & Murakami, K 2008, Effect of reordering internal messages in MPI broadcast according to the load imbalance. : Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008., 5453548, Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems, pp. 11-16, 2008 International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, IWIA'08, Hilo, HI, 米国, 1/21/08. https://doi.org/10.1109/IWIA.2008.14
Soga T, Nanri T, Kurokawa M, Murakami K. Effect of reordering internal messages in MPI broadcast according to the load imbalance. : Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008. 2008. p. 11-16. 5453548. (Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems). https://doi.org/10.1109/IWIA.2008.14
Soga, Takesi ; Nanri, Takeshi ; Kurokawa, Motoyoshi ; Murakami, Kazuaki. / Effect of reordering internal messages in MPI broadcast according to the load imbalance. Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008. 2008. pp. 11-16 (Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems).
@inproceedings{9e194eafd7684693bab92cd99ab1f4ea,
title = "Effect of reordering internal messages in MPI broadcast according to the load imbalance",
abstract = "To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.",
author = "Takesi Soga and Takeshi Nanri and Motoyoshi Kurokawa and Kazuaki Murakami",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/IWIA.2008.14",
language = "English",
isbn = "9780769537702",
series = "Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems",
pages = "11--16",
booktitle = "Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008",

}

TY - GEN

T1 - Effect of reordering internal messages in MPI broadcast according to the load imbalance

AU - Soga, Takesi

AU - Nanri, Takeshi

AU - Kurokawa, Motoyoshi

AU - Murakami, Kazuaki

PY - 2008/12/1

Y1 - 2008/12/1

N2 - To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.

AB - To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.

UR - http://www.scopus.com/inward/record.url?scp=77952640335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952640335&partnerID=8YFLogxK

U2 - 10.1109/IWIA.2008.14

DO - 10.1109/IWIA.2008.14

M3 - Conference contribution

AN - SCOPUS:77952640335

SN - 9780769537702

T3 - Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems

SP - 11

EP - 16

BT - Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008

ER -