TY - GEN
T1 - Missing Value Imputation for Remote Healthcare Data
T2 - 9th International Japan-Africa Conference on Electronics, Communications, and Computations, JAC-ECC 2021
AU - Imamura, Yosuke
AU - Abedin, Nuren
AU - Sixian, Luo
AU - Tabassum, Shaira
AU - Ahmed, Ashir
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - This study aims to investigate the best method for imputing missing values in remote healthcare data set. Missing value means an empty field in a health record. It may occur for three major reasons- (i) the parameter was not measured (ii) measured but not recorded and (iii) lost during communications. Our case study, Portable Health Clinic (PHC) data has been collected from multiple regions, by different authorities in different time. PHC data contains manual errors too. Missing and erroneous data are problematic for data analysis and for making accurate predictions. Hence, it is necessary to detect and eliminate error data and also fill the empty fields. Missing value imputation methods are widely known for processing numerical data. PHC data has both numerical and categorical data which makes it difficult to impute. We came up with a new data processing mechanism to feed into existing machine learning algorithm. To test our idea, we used a complete PHC data set (numerical only) without any missing values. Then we generated missing values by randomly erasing a part of the data set. We used several existing imputation methods and our proposed method on the same target data set to compare their performances. It is found that the Mean Imputer, kNN and MissForest are not effective. Iterative Imputer predicted best in 7 features and ours in 4 cases. Therefore, it can be concluded that the effectiveness of imputation methods may vary depending on the specific data set and features. Our future work is to include the categorical data and monitor the performance.
AB - This study aims to investigate the best method for imputing missing values in remote healthcare data set. Missing value means an empty field in a health record. It may occur for three major reasons- (i) the parameter was not measured (ii) measured but not recorded and (iii) lost during communications. Our case study, Portable Health Clinic (PHC) data has been collected from multiple regions, by different authorities in different time. PHC data contains manual errors too. Missing and erroneous data are problematic for data analysis and for making accurate predictions. Hence, it is necessary to detect and eliminate error data and also fill the empty fields. Missing value imputation methods are widely known for processing numerical data. PHC data has both numerical and categorical data which makes it difficult to impute. We came up with a new data processing mechanism to feed into existing machine learning algorithm. To test our idea, we used a complete PHC data set (numerical only) without any missing values. Then we generated missing values by randomly erasing a part of the data set. We used several existing imputation methods and our proposed method on the same target data set to compare their performances. It is found that the Mean Imputer, kNN and MissForest are not effective. Iterative Imputer predicted best in 7 features and ours in 4 cases. Therefore, it can be concluded that the effectiveness of imputation methods may vary depending on the specific data set and features. Our future work is to include the categorical data and monitor the performance.
UR - http://www.scopus.com/inward/record.url?scp=85126738357&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126738357&partnerID=8YFLogxK
U2 - 10.1109/JAC-ECC54461.2021.9691308
DO - 10.1109/JAC-ECC54461.2021.9691308
M3 - Conference contribution
AN - SCOPUS:85126738357
T3 - Proceedings of the 2021 International Japan-Africa Conference on Electronics, Communications, and Computations, JAC-ECC 2021
SP - 85
EP - 88
BT - Proceedings of the 2021 International Japan-Africa Conference on Electronics, Communications, and Computations, JAC-ECC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 December 2021 through 14 December 2021
ER -