TY - GEN
T1 - Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models
AU - Bennin, Kwabena Ebo
AU - Keung, Jacky
AU - Monden, Akito
AU - Kamei, Yasutaka
AU - Ubayashi, Naoyasu
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/8/24
Y1 - 2016/8/24
N2 - To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.
AB - To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.
UR - http://www.scopus.com/inward/record.url?scp=84987981885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84987981885&partnerID=8YFLogxK
U2 - 10.1109/COMPSAC.2016.144
DO - 10.1109/COMPSAC.2016.144
M3 - Conference contribution
AN - SCOPUS:84987981885
T3 - Proceedings - International Computer Software and Applications Conference
SP - 154
EP - 163
BT - Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
A2 - Claycomb, William
A2 - Milojicic, Dejan
A2 - Liu, Ling
A2 - Matskin, Mihhail
A2 - Zhang, Zhiyong
A2 - Reisman, Sorel
A2 - Sato, Hiroyuki
A2 - Zhang, Zhiyong
A2 - Ahamed, Sheikh Iqbal
PB - IEEE Computer Society
T2 - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
Y2 - 10 June 2016 through 14 June 2016
ER -