Data I/O management approach for the post-hoc visualization of big simulation data results

Jorji Nonaka, Eduardo C. Inacio, Kenji Ono, Mario A.R. Dantas, Yasuhiro Kawashima, Tomohiro Kawanabe, Fumiyoshi Shoji

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.

Original languageEnglish
Article number1840006
JournalInternational Journal of Modeling, Simulation, and Scientific Computing
Volume9
Issue number3
DOIs
Publication statusPublished - Jun 1 2018

Fingerprint

Visualization
Output
Simulation
Vertex of a graph
Supercomputers
Metadata
Data Replication
Information management
File System
Datalog
Supercomputer
Data Management
Climate
Preprocessing
Penalty
Computational Cost
Assignment
Computer simulation
Directly proportional
Processing

All Science Journal Classification (ASJC) codes

  • Modelling and Simulation
  • Computer Science Applications

Cite this

Data I/O management approach for the post-hoc visualization of big simulation data results. / Nonaka, Jorji; Inacio, Eduardo C.; Ono, Kenji; Dantas, Mario A.R.; Kawashima, Yasuhiro; Kawanabe, Tomohiro; Shoji, Fumiyoshi.

In: International Journal of Modeling, Simulation, and Scientific Computing, Vol. 9, No. 3, 1840006, 01.06.2018.

Research output: Contribution to journalArticle

Nonaka, Jorji ; Inacio, Eduardo C. ; Ono, Kenji ; Dantas, Mario A.R. ; Kawashima, Yasuhiro ; Kawanabe, Tomohiro ; Shoji, Fumiyoshi. / Data I/O management approach for the post-hoc visualization of big simulation data results. In: International Journal of Modeling, Simulation, and Scientific Computing. 2018 ; Vol. 9, No. 3.
@article{2a3cc8ab31d545448f854cdf94948af7,
title = "Data I/O management approach for the post-hoc visualization of big simulation data results",
abstract = "Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.",
author = "Jorji Nonaka and Inacio, {Eduardo C.} and Kenji Ono and Dantas, {Mario A.R.} and Yasuhiro Kawashima and Tomohiro Kawanabe and Fumiyoshi Shoji",
year = "2018",
month = "6",
day = "1",
doi = "10.1142/S1793962318400068",
language = "English",
volume = "9",
journal = "International Journal of Modeling, Simulation, and Scientific Computing",
issn = "1793-9623",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "3",

}

TY - JOUR

T1 - Data I/O management approach for the post-hoc visualization of big simulation data results

AU - Nonaka, Jorji

AU - Inacio, Eduardo C.

AU - Ono, Kenji

AU - Dantas, Mario A.R.

AU - Kawashima, Yasuhiro

AU - Kawanabe, Tomohiro

AU - Shoji, Fumiyoshi

PY - 2018/6/1

Y1 - 2018/6/1

N2 - Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.

AB - Leading-edge supercomputers, such as the K computer, have generated a vast amount of simulation results, and most of these datasets were stored on the file system for the post-hoc analysis such as visualization. In this work, we first investigated the data generation trends of the K computer by analyzing some operational log data files. We verified a tendency of generating large amounts of distributed files as simulation outputs, and in most cases, the number of files has been proportional to the number of utilized computational nodes, that is, each computational node producing one or more files. Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations, a flexible data input/output (I/O) management mechanism becomes highly useful for the post-hoc visualization and analysis. In this work, we focused on the xDMlib data management library, and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results. In the proposed approach, a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process. We evaluated the proposed approach by using a 32-node visualization cluster, and the K computer. Besides the inevitable performance penalty associated with longer data loading time, when using smaller number of processes, there is a benefit for avoiding any data replication via copy, conversion, or extraction. In addition, users will be able to freely select any number of nodes, without caring about the number of distributed files, for the post-hoc visualization and analysis purposes.

UR - http://www.scopus.com/inward/record.url?scp=85045113329&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045113329&partnerID=8YFLogxK

U2 - 10.1142/S1793962318400068

DO - 10.1142/S1793962318400068

M3 - Article

VL - 9

JO - International Journal of Modeling, Simulation, and Scientific Computing

JF - International Journal of Modeling, Simulation, and Scientific Computing

SN - 1793-9623

IS - 3

M1 - 1840006

ER -