An empirical study of just-in-time defect prediction using cross-project models

Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Citations (Scopus)

Abstract

Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.

Original languageEnglish
Title of host publication11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings
PublisherAssociation for Computing Machinery, Inc
Pages172-181
Number of pages10
ISBN (Electronic)9781450328630
DOIs
Publication statusPublished - May 31 2014
Event11th International Working Conference on Mining Software Repositories, MSR 2014 - Hyderabad, India
Duration: May 31 2014Jun 1 2014

Other

Other11th International Working Conference on Mining Software Repositories, MSR 2014
CountryIndia
CityHyderabad
Period5/31/146/1/14

Fingerprint

Defects
Feedback

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Cite this

Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., & Ubayashi, N. (2014). An empirical study of just-in-time defect prediction using cross-project models. In 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings (pp. 172-181). Association for Computing Machinery, Inc. https://doi.org/10.1145/2597073.2597075

An empirical study of just-in-time defect prediction using cross-project models. / Fukushima, Takafumi; Kamei, Yasutaka; McIntosh, Shane; Yamashita, Kazuhiro; Ubayashi, Naoyasu.

11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings. Association for Computing Machinery, Inc, 2014. p. 172-181.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fukushima, T, Kamei, Y, McIntosh, S, Yamashita, K & Ubayashi, N 2014, An empirical study of just-in-time defect prediction using cross-project models. in 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings. Association for Computing Machinery, Inc, pp. 172-181, 11th International Working Conference on Mining Software Repositories, MSR 2014, Hyderabad, India, 5/31/14. https://doi.org/10.1145/2597073.2597075
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N. An empirical study of just-in-time defect prediction using cross-project models. In 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings. Association for Computing Machinery, Inc. 2014. p. 172-181 https://doi.org/10.1145/2597073.2597075
Fukushima, Takafumi ; Kamei, Yasutaka ; McIntosh, Shane ; Yamashita, Kazuhiro ; Ubayashi, Naoyasu. / An empirical study of just-in-time defect prediction using cross-project models. 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings. Association for Computing Machinery, Inc, 2014. pp. 172-181
@inproceedings{fb4a90010842475eb1817c587c2846d9,
title = "An empirical study of just-in-time defect prediction using cross-project models",
abstract = "Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.",
author = "Takafumi Fukushima and Yasutaka Kamei and Shane McIntosh and Kazuhiro Yamashita and Naoyasu Ubayashi",
year = "2014",
month = "5",
day = "31",
doi = "10.1145/2597073.2597075",
language = "English",
pages = "172--181",
booktitle = "11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - An empirical study of just-in-time defect prediction using cross-project models

AU - Fukushima, Takafumi

AU - Kamei, Yasutaka

AU - McIntosh, Shane

AU - Yamashita, Kazuhiro

AU - Ubayashi, Naoyasu

PY - 2014/5/31

Y1 - 2014/5/31

N2 - Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.

AB - Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected.

UR - http://www.scopus.com/inward/record.url?scp=84938794114&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938794114&partnerID=8YFLogxK

U2 - 10.1145/2597073.2597075

DO - 10.1145/2597073.2597075

M3 - Conference contribution

SP - 172

EP - 181

BT - 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings

PB - Association for Computing Machinery, Inc

ER -