TY - JOUR
T1 - Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
AU - H., Muhammad Asraf
AU - Nur, Nur Dalila
AU - Md Tahir, Nooritawati
AU - Iffah Abd Latiff, Zatul
AU - Huzaimy Jusoh, Mohamad
AU - Akimasa, Yoshikawa
N1 - Funding Information:
The acknowledgement is dedicated to support grants obtained from the Ministry of Higher Education (MoHE), Malaysia, and the Faculty of Electrical Engineering, Universiti Teknologi MARA under the Fundamental Research Grant Scheme, FRGS (Grant No. 600-IRMI / FRGS 5/3 (091/2019) ). The work was also supported by JSPS KAKENHI Grant Number JP20H01961 . The authors also acknowledge the collaboration with the Malaysian Space Agency (MYSA) and the International Center for Space Weather, Japan.
Funding Information:
The acknowledgement is dedicated to support grants obtained from the Ministry of Higher Education (MoHE), Malaysia, and the Faculty of Electrical Engineering, Universiti Teknologi MARA under the Fundamental Research Grant Scheme, FRGS (Grant No. 600-IRMI / FRGS 5/3 (091/2019)). The work was also supported by JSPS KAKENHI Grant Number JP20H01961. The authors also acknowledge the collaboration with the Malaysian Space Agency (MYSA) and the International Center for Space Weather, Japan.
Publisher Copyright:
© 2021 THE AUTHORS
PY - 2022/1
Y1 - 2022/1
N2 - Data imputation studies include reconstruction or estimation of imperfect data gaps caused by system sensing failure, and non-responsive data transmission remains an open issue. In space weather applications, imputation of ground electromagnetism is significant in capturing the complex interaction of sun–earth prior to the subsequent analysis of the space weather effects. Key contributions to the demonstration of supervised machine learning (ML) imputation approach with artificial neural network, K-nearest neighbour, support vector regression (SVR), and General Regression Neural Network (GRNN) for MAGDAS-9 ground electromagnetism dataset have not yet been established. A total of 1,585,950 data points were analysed with supervised ML models which included performance benchmark with statistical analysis namely zero value substitution, listwise deletion, mean substitution, and hot deck imputation. To achieve low reconstruction errors, different imputation models with hyperparameter tuned settings are varied, and computational time execution has been shown to contribute to imputation performance. Performance metrics measured by mean square error (MSE), mean absolute error (MAE),mean absolute percentage error (MAPE), and execution time respectively demonstrate the capability of SVR to perfectly impute missing data for all ground electromagnetism components at an average of 0.314 MSE, 0.738 MAPE, closeness to 0.510 MAE and 0.91-second at various percentage level of data missingness. A comparison with traditional imputation shows that the supervised ML with SVR model has improved imputation performance by up to 80% of data gap. The outcome of the proposed imputation will benefit space weather applications for event characterisation, which will cover a large number of missing data in the MAGDAS-9 dataset.
AB - Data imputation studies include reconstruction or estimation of imperfect data gaps caused by system sensing failure, and non-responsive data transmission remains an open issue. In space weather applications, imputation of ground electromagnetism is significant in capturing the complex interaction of sun–earth prior to the subsequent analysis of the space weather effects. Key contributions to the demonstration of supervised machine learning (ML) imputation approach with artificial neural network, K-nearest neighbour, support vector regression (SVR), and General Regression Neural Network (GRNN) for MAGDAS-9 ground electromagnetism dataset have not yet been established. A total of 1,585,950 data points were analysed with supervised ML models which included performance benchmark with statistical analysis namely zero value substitution, listwise deletion, mean substitution, and hot deck imputation. To achieve low reconstruction errors, different imputation models with hyperparameter tuned settings are varied, and computational time execution has been shown to contribute to imputation performance. Performance metrics measured by mean square error (MSE), mean absolute error (MAE),mean absolute percentage error (MAPE), and execution time respectively demonstrate the capability of SVR to perfectly impute missing data for all ground electromagnetism components at an average of 0.314 MSE, 0.738 MAPE, closeness to 0.510 MAE and 0.91-second at various percentage level of data missingness. A comparison with traditional imputation shows that the supervised ML with SVR model has improved imputation performance by up to 80% of data gap. The outcome of the proposed imputation will benefit space weather applications for event characterisation, which will cover a large number of missing data in the MAGDAS-9 dataset.
UR - http://www.scopus.com/inward/record.url?scp=85107441696&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107441696&partnerID=8YFLogxK
U2 - 10.1016/j.aej.2021.04.096
DO - 10.1016/j.aej.2021.04.096
M3 - Article
AN - SCOPUS:85107441696
SN - 1110-0168
VL - 61
SP - 937
EP - 947
JO - AEJ - Alexandria Engineering Journal
JF - AEJ - Alexandria Engineering Journal
IS - 1
ER -