TY - JOUR
T1 - New insights into hydrogen uptake on porous carbon materials via explainable machine learning
AU - Maulana Kusdhany, Muhammad Irfan
AU - Lyth, Stephen Matthew
N1 - Funding Information:
This work was supported by JST COI Grant Number JPMJCE1318 (Japan), and a JSPS KAKENHI Grant-in-Aid for Scientific Research B , Grant Number 19H02558 (Japan).
Funding Information:
Five different models were evaluated for their predictive performance: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). To tune the hyperparameters of each model, we performed group 5-fold cross-validation using either the function GridSearchCV(), or RandomizedSearchCV() in scikit-learn (a free Python library for machine learning), with parameters specified in Table S1. Group 5-fold cross-validation is used here instead of regular cross validation to ensure that the models generalize well to unseen samples. The sample names are used as group labels so that in each fold, every test set will not contain data from carbon samples in its respective training set. If regular K-fold cross validation were used instead, where the test-training split are completely randomized, the model may only have needed to interpolate or complete an isotherm for a known carbon sample, rather than generate an entirely new isotherm for an unknown sample. The difference between the two cross-validation methods is summarized in Fig. 1. We tried five different machine learning models to predict excess hydrogen uptake based on the textural and chemical properties of the different carbon materials: (i) least squares linear regression (LR); (ii) support vector regressor with linear kernel (SVR(L)); (iii) SVR with radial basis function kernel (SVR (RBF)); (iv) extreme gradient boosted trees (XGBT, implemented using the XGBoost library); and (v) random forest regressor (RF). The cross-validated performances of the different models are compared in Table 1. In addition, a comparison between the predicted and actual hydrogen uptake values for different models is shown in Fig. 3. Clearly, linear approximations are not well suited for this prediction task, since LR and SVR(L) performed significantly worse than the non-linear models. This result is to be expected for two reasons: first, the strong relationship between pressure and uptake is non-linear; second, linear models don't perform as well with multicollinear predictor variables [62]. Based on the performance metrics, the random forest (RF) regression method was selected due to its high performance (R2 > 0.9), and refit with the entire dataset. Notably, even with nested cross-validation, the difference in runtime is not too severe compared to the other models, taking only several minutes to run. RF also has the advantage of being robust against multicollinearities [28], which we have demonstrated to exist in this dataset.This work was supported by JST COI Grant Number JPMJCE1318 (Japan), and a JSPS KAKENHI Grant-in-Aid for Scientific Research B, Grant Number 19H02558 (Japan).
Publisher Copyright:
© 2021 The Author(s)
PY - 2021/7
Y1 - 2021/7
N2 - To understand hydrogen uptake in porous carbon materials, we developed machine learning models to predict excess uptake at 77 K based on the textural and chemical properties of carbon, using a dataset containing 68 different samples and 1745 data points. Random forest is selected due to its high performance (R2 > 0.9), and analysis is performed using Shapley Additive Explanations (SHAP). It is found that pressure and Brunauer-Emmett-Teller (BET) surface area are the two strongest predictors of excess hydrogen uptake. Surprisingly, this is followed by a positive correlation with oxygen content, contributing up to ∼0.6 wt% additional hydrogen uptake, contradicting the conclusions of previous studies. Finally, pore volume has the smallest effect. The pore size distribution is also found to be important, since ultramicropores (dp < 0.7 nm) are found to be more positively correlated with excess uptake than micropores (dp < 2 nm). However, this effect is quite small compared to the role of BET surface area and total pore volume. The novel approach taken here can provide important insights in the rational design of carbon materials for hydrogen storage applications.
AB - To understand hydrogen uptake in porous carbon materials, we developed machine learning models to predict excess uptake at 77 K based on the textural and chemical properties of carbon, using a dataset containing 68 different samples and 1745 data points. Random forest is selected due to its high performance (R2 > 0.9), and analysis is performed using Shapley Additive Explanations (SHAP). It is found that pressure and Brunauer-Emmett-Teller (BET) surface area are the two strongest predictors of excess hydrogen uptake. Surprisingly, this is followed by a positive correlation with oxygen content, contributing up to ∼0.6 wt% additional hydrogen uptake, contradicting the conclusions of previous studies. Finally, pore volume has the smallest effect. The pore size distribution is also found to be important, since ultramicropores (dp < 0.7 nm) are found to be more positively correlated with excess uptake than micropores (dp < 2 nm). However, this effect is quite small compared to the role of BET surface area and total pore volume. The novel approach taken here can provide important insights in the rational design of carbon materials for hydrogen storage applications.
UR - http://www.scopus.com/inward/record.url?scp=85104385813&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104385813&partnerID=8YFLogxK
U2 - 10.1016/j.carbon.2021.04.036
DO - 10.1016/j.carbon.2021.04.036
M3 - Article
AN - SCOPUS:85104385813
VL - 179
SP - 190
EP - 201
JO - Carbon
JF - Carbon
SN - 0008-6223
ER -