Using search-based metric selection and oversampling to predict fault prone modules

R. Vivanco, Y. Kamei, A. Monden, K. Matsumoto, D. Jin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Predictive models can be used in the detection of fault prone modules using source code metrics as inputs for the classifier. However, there exist numerous structural measures that capture different aspects of size, coupling and complexity. Identifying a metric subset that enhances the performance for the predictive objective would not only improve the model but also provide insights into the structural properties that lead to problematic modules. Another difficulty in building predictive models comes from unbalanced datasets, which are common in empirical software engineering as a majority of the modules are not likely to be faulty. Oversampling attempts to overcome this deficiency by generating new training instances from the faulty modules. We present the results of applying searchbased metric selection and oversampling to three NASA datasets. For these datasets, oversampling results in the largest improvement. Metric subset selection was able to reduce up to 52% of the metrics without decreasing the predictive performance gained with oversampling.

Original languageEnglish
Title of host publication2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
DOIs
Publication statusPublished - Nov 5 2010
Externally publishedYes
Event2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010 - Calgary, AB, Canada
Duration: May 2 2010May 5 2010

Publication series

NameCanadian Conference on Electrical and Computer Engineering
ISSN (Print)0840-7789

Other

Other2010 23rd Canadian Conference on Electrical and Computer Engineering, CCECE 2010
Country/TerritoryCanada
CityCalgary, AB
Period5/2/105/5/10

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Using search-based metric selection and oversampling to predict fault prone modules'. Together they form a unique fingerprint.

Cite this