TY - GEN
T1 - Using search-based metric selection and oversampling to predict fault prone modules
AU - Vivanco, R.
AU - Kamei, Y.
AU - Monden, A.
AU - Matsumoto, K.
AU - Jin, D.
PY - 2010/11/5
Y1 - 2010/11/5
N2 - Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.
AB - Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.
UR - http://www.scopus.com/inward/record.url?scp=78049348524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049348524&partnerID=8YFLogxK
U2 - 10.1109/CCECE.2010.5575249
DO - 10.1109/CCECE.2010.5575249
M3 - Conference contribution
AN - SCOPUS:78049348524
SN - 9781424453764
T3 - Canadian Conference on Electrical and Computer Engineering
BT - 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
T2 - 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
Y2 - 2 May 2010 through 5 May 2010
ER -