TY - GEN
T1 - Effect of reordering internal messages in MPI broadcast according to the load imbalance
AU - Soga, Takesi
AU - Nanri, Takeshi
AU - Kurokawa, Motoyoshi
AU - Murakami, Kazuaki
PY - 2008/12/1
Y1 - 2008/12/1
N2 - To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.
AB - To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced. Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance. From the results of experiments, it is shown that this optimization method can reduced the time for the operation. In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.
UR - http://www.scopus.com/inward/record.url?scp=77952640335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952640335&partnerID=8YFLogxK
U2 - 10.1109/IWIA.2008.14
DO - 10.1109/IWIA.2008.14
M3 - Conference contribution
AN - SCOPUS:77952640335
SN - 9780769537702
T3 - Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
SP - 11
EP - 16
BT - Innovative Architecture for Future Generation High-Performance Processors and Systems, IWIA 2008
T2 - 2008 International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems, IWIA'08
Y2 - 21 January 2008 through 23 January 2008
ER -