Using search-based metric selection and oversampling to predict fault prone modules

R. Vivanco, Yasutaka Kamei, A. Monden, K. Matsumoto, D. Jin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.

Original languageEnglish
Title of host publication2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010 - Calgary, AB, Canada
Duration: May 2 2010May 5 2010

Other

Other2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
CountryCanada
CityCalgary, AB
Period5/2/105/5/10

Fingerprint

NASA
Structural properties
Software engineering
Classifiers

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Cite this

Vivanco, R., Kamei, Y., Monden, A., Matsumoto, K., & Jin, D. (2010). Using search-based metric selection and oversampling to predict fault prone modules. In 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010 [5575249] https://doi.org/10.1109/CCECE.2010.5575249

Using search-based metric selection and oversampling to predict fault prone modules. / Vivanco, R.; Kamei, Yasutaka; Monden, A.; Matsumoto, K.; Jin, D.

2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010. 2010. 5575249.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vivanco, R, Kamei, Y, Monden, A, Matsumoto, K & Jin, D 2010, Using search-based metric selection and oversampling to predict fault prone modules. in 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010., 5575249, 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010, Calgary, AB, Canada, 5/2/10. https://doi.org/10.1109/CCECE.2010.5575249
Vivanco R, Kamei Y, Monden A, Matsumoto K, Jin D. Using search-based metric selection and oversampling to predict fault prone modules. In 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010. 2010. 5575249 https://doi.org/10.1109/CCECE.2010.5575249
Vivanco, R. ; Kamei, Yasutaka ; Monden, A. ; Matsumoto, K. ; Jin, D. / Using search-based metric selection and oversampling to predict fault prone modules. 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010. 2010.
@inproceedings{b566386487da48a48ad018becb9ae763,
title = "Using search-based metric selection and oversampling to predict fault prone modules",
abstract = "Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52{\%} of the metrics without decreasing the predictive performance gained with oversampling.",
author = "R. Vivanco and Yasutaka Kamei and A. Monden and K. Matsumoto and D. Jin",
year = "2010",
doi = "10.1109/CCECE.2010.5575249",
language = "English",
isbn = "9781424453764",
booktitle = "2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010",

}

TY - GEN

T1 - Using search-based metric selection and oversampling to predict fault prone modules

AU - Vivanco, R.

AU - Kamei, Yasutaka

AU - Monden, A.

AU - Matsumoto, K.

AU - Jin, D.

PY - 2010

Y1 - 2010

N2 - Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.

AB - Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.

UR - http://www.scopus.com/inward/record.url?scp=78049348524&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049348524&partnerID=8YFLogxK

U2 - 10.1109/CCECE.2010.5575249

DO - 10.1109/CCECE.2010.5575249

M3 - Conference contribution

AN - SCOPUS:78049348524

SN - 9781424453764

BT - 2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010

ER -