Parallel POD compression of time-varying big datasets using m-swap on the K computer

Chongke Bi, Kenji Ono, Lu Yang

研究成果: 著書/レポートタイプへの貢献会議での発言

1 引用 (Scopus)

抄録

Thanks to the supercomputer, more and more complicated simulations are successfully achieved. On the other hand, to analyze and understand the intrinsic properties of the big datasets from the simulations is an urgent research for scientists. However, the explosive size of the big datasets makes such kind of task difficult. Therefore, reduction of the size of the big datasets becomes an important topic, in which data compression and parallel computing are the two key techniques. In this paper, we presented a parallel data compression approach to reduce the size of time-varying big datasets. Firstly, we employ the proper orthogonal decomposition (POD) method for compression. The POD method can extract the underlying features of datasets to greatly reduce the size of big datasets. Meanwhile, the compressed datasets can be decompressed linearly. This feature can help scientists to interactively visualize big datasets for analysis. Then, we introduced a novel m-swap method to effectively parallelize the POD compression algorithm. The m-swap method can reach a high performance through fully using all parallel computing processors. In another word, no idle processors exist in the parallel compression process. Furthermore, the m-swap method can greatly reduce the cost of interprocessor communication. This is achieved by controlling the data transfer among 2m processors to obtain the best balance of computation cost of these processors. Finally, the effectiveness of our method will be demonstrated through compressing several time-varying big datasets on the K computer with ten thousands of processors.

元の言語英語
ホスト出版物のタイトルProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
出版者Institute of Electrical and Electronics Engineers Inc.
ページ438-445
ページ数8
ISBN(電子版)9781479950577
DOI
出版物ステータス出版済み - 9 22 2014
外部発表Yes
イベント3rd IEEE International Congress on Big Data, BigData Congress 2014 - Anchorage, 米国
継続期間: 6 27 20147 2 2014

その他

その他3rd IEEE International Congress on Big Data, BigData Congress 2014
米国
Anchorage
期間6/27/147/2/14

Fingerprint

Data compression
Parallel processing systems
Decomposition
Supercomputers
Data transfer
Costs
Communication

All Science Journal Classification (ASJC) codes

  • Computer Science Applications

これを引用

Bi, C., Ono, K., & Yang, L. (2014). Parallel POD compression of time-varying big datasets using m-swap on the K computer. : Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014 (pp. 438-445). [6906813] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.Congress.2014.70

Parallel POD compression of time-varying big datasets using m-swap on the K computer. / Bi, Chongke; Ono, Kenji; Yang, Lu.

Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 438-445 6906813.

研究成果: 著書/レポートタイプへの貢献会議での発言

Bi, C, Ono, K & Yang, L 2014, Parallel POD compression of time-varying big datasets using m-swap on the K computer. : Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014., 6906813, Institute of Electrical and Electronics Engineers Inc., pp. 438-445, 3rd IEEE International Congress on Big Data, BigData Congress 2014, Anchorage, 米国, 6/27/14. https://doi.org/10.1109/BigData.Congress.2014.70
Bi C, Ono K, Yang L. Parallel POD compression of time-varying big datasets using m-swap on the K computer. : Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 438-445. 6906813 https://doi.org/10.1109/BigData.Congress.2014.70
Bi, Chongke ; Ono, Kenji ; Yang, Lu. / Parallel POD compression of time-varying big datasets using m-swap on the K computer. Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 438-445
@inproceedings{abad0bee76874e47ad6c460eda55f868,
title = "Parallel POD compression of time-varying big datasets using m-swap on the K computer",
abstract = "Thanks to the supercomputer, more and more complicated simulations are successfully achieved. On the other hand, to analyze and understand the intrinsic properties of the big datasets from the simulations is an urgent research for scientists. However, the explosive size of the big datasets makes such kind of task difficult. Therefore, reduction of the size of the big datasets becomes an important topic, in which data compression and parallel computing are the two key techniques. In this paper, we presented a parallel data compression approach to reduce the size of time-varying big datasets. Firstly, we employ the proper orthogonal decomposition (POD) method for compression. The POD method can extract the underlying features of datasets to greatly reduce the size of big datasets. Meanwhile, the compressed datasets can be decompressed linearly. This feature can help scientists to interactively visualize big datasets for analysis. Then, we introduced a novel m-swap method to effectively parallelize the POD compression algorithm. The m-swap method can reach a high performance through fully using all parallel computing processors. In another word, no idle processors exist in the parallel compression process. Furthermore, the m-swap method can greatly reduce the cost of interprocessor communication. This is achieved by controlling the data transfer among 2m processors to obtain the best balance of computation cost of these processors. Finally, the effectiveness of our method will be demonstrated through compressing several time-varying big datasets on the K computer with ten thousands of processors.",
author = "Chongke Bi and Kenji Ono and Lu Yang",
year = "2014",
month = "9",
day = "22",
doi = "10.1109/BigData.Congress.2014.70",
language = "English",
pages = "438--445",
booktitle = "Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Parallel POD compression of time-varying big datasets using m-swap on the K computer

AU - Bi, Chongke

AU - Ono, Kenji

AU - Yang, Lu

PY - 2014/9/22

Y1 - 2014/9/22

N2 - Thanks to the supercomputer, more and more complicated simulations are successfully achieved. On the other hand, to analyze and understand the intrinsic properties of the big datasets from the simulations is an urgent research for scientists. However, the explosive size of the big datasets makes such kind of task difficult. Therefore, reduction of the size of the big datasets becomes an important topic, in which data compression and parallel computing are the two key techniques. In this paper, we presented a parallel data compression approach to reduce the size of time-varying big datasets. Firstly, we employ the proper orthogonal decomposition (POD) method for compression. The POD method can extract the underlying features of datasets to greatly reduce the size of big datasets. Meanwhile, the compressed datasets can be decompressed linearly. This feature can help scientists to interactively visualize big datasets for analysis. Then, we introduced a novel m-swap method to effectively parallelize the POD compression algorithm. The m-swap method can reach a high performance through fully using all parallel computing processors. In another word, no idle processors exist in the parallel compression process. Furthermore, the m-swap method can greatly reduce the cost of interprocessor communication. This is achieved by controlling the data transfer among 2m processors to obtain the best balance of computation cost of these processors. Finally, the effectiveness of our method will be demonstrated through compressing several time-varying big datasets on the K computer with ten thousands of processors.

AB - Thanks to the supercomputer, more and more complicated simulations are successfully achieved. On the other hand, to analyze and understand the intrinsic properties of the big datasets from the simulations is an urgent research for scientists. However, the explosive size of the big datasets makes such kind of task difficult. Therefore, reduction of the size of the big datasets becomes an important topic, in which data compression and parallel computing are the two key techniques. In this paper, we presented a parallel data compression approach to reduce the size of time-varying big datasets. Firstly, we employ the proper orthogonal decomposition (POD) method for compression. The POD method can extract the underlying features of datasets to greatly reduce the size of big datasets. Meanwhile, the compressed datasets can be decompressed linearly. This feature can help scientists to interactively visualize big datasets for analysis. Then, we introduced a novel m-swap method to effectively parallelize the POD compression algorithm. The m-swap method can reach a high performance through fully using all parallel computing processors. In another word, no idle processors exist in the parallel compression process. Furthermore, the m-swap method can greatly reduce the cost of interprocessor communication. This is achieved by controlling the data transfer among 2m processors to obtain the best balance of computation cost of these processors. Finally, the effectiveness of our method will be demonstrated through compressing several time-varying big datasets on the K computer with ten thousands of processors.

UR - http://www.scopus.com/inward/record.url?scp=84923902633&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923902633&partnerID=8YFLogxK

U2 - 10.1109/BigData.Congress.2014.70

DO - 10.1109/BigData.Congress.2014.70

M3 - Conference contribution

AN - SCOPUS:84923902633

SP - 438

EP - 445

BT - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -