TY - JOUR
T1 - An empirical study of issue-link algorithms
T2 - which issue-link algorithms should we use?
AU - Kondo, Masanari
AU - Kashiwa, Yutaro
AU - Kamei, Yasutaka
AU - Mizuno, Osamu
N1 - Funding Information:
This work has been supported by JSPS KAKENHI Japan (Grant Numbers: JP19J23477 and JP18H03222) and JSPS International Joint Research Program with SNSF (Project “SENSOR”). We would like to thank Editage ( www.editage.com ) for English language editing.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/11
Y1 - 2022/11
N2 - The accuracy of the SZZ algorithm is pivotal for just-in-time defect prediction because most prior studies have used the SZZ algorithm to detect defect-inducing commits to construct and evaluate their defect prediction models. The SZZ algorithm has two phases to detect defect-inducing commits: (1) linking issue reports in an issue-tracking system to possible defect-fixing commits in a version control system by using an issue-link algorithm (ILA); and (2) tracing the modifications of defect-fixing commits back to possible defect-inducing commits. Researchers and practitioners can address the second phase by using existing solutions such as a tool called cregit. In contrast, although various ILAs have been proposed for the first phase, no large-scale studies exist in which such ILAs are evaluated under the same experimental conditions. Hence, we still have no conclusions regarding the best-performing ILA for the first phase. In this paper, we compare 10 ILAs collected from our systematic literature study with regards to the accuracy of detecting defect-fixing commits. In addition, we compare the defect prediction performance of ILAs and their combinations that can detect defect-fixing commits accurately. We conducted experiments on five open-source software projects. We found that all ILAs and their combinations prevented the defect prediction model from being affected by missing defect-fixing commits. In particular, the combination of a natural language text similarity approach, Phantom heuristics, a random forest approach, and a support vector machine approach is the best way to statistically significantly reduced the absolute differences from the ground-truth defect prediction performance. We summarized the guidelines to use ILAs as our recommendations.
AB - The accuracy of the SZZ algorithm is pivotal for just-in-time defect prediction because most prior studies have used the SZZ algorithm to detect defect-inducing commits to construct and evaluate their defect prediction models. The SZZ algorithm has two phases to detect defect-inducing commits: (1) linking issue reports in an issue-tracking system to possible defect-fixing commits in a version control system by using an issue-link algorithm (ILA); and (2) tracing the modifications of defect-fixing commits back to possible defect-inducing commits. Researchers and practitioners can address the second phase by using existing solutions such as a tool called cregit. In contrast, although various ILAs have been proposed for the first phase, no large-scale studies exist in which such ILAs are evaluated under the same experimental conditions. Hence, we still have no conclusions regarding the best-performing ILA for the first phase. In this paper, we compare 10 ILAs collected from our systematic literature study with regards to the accuracy of detecting defect-fixing commits. In addition, we compare the defect prediction performance of ILAs and their combinations that can detect defect-fixing commits accurately. We conducted experiments on five open-source software projects. We found that all ILAs and their combinations prevented the defect prediction model from being affected by missing defect-fixing commits. In particular, the combination of a natural language text similarity approach, Phantom heuristics, a random forest approach, and a support vector machine approach is the best way to statistically significantly reduced the absolute differences from the ground-truth defect prediction performance. We summarized the guidelines to use ILAs as our recommendations.
UR - http://www.scopus.com/inward/record.url?scp=85134559954&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134559954&partnerID=8YFLogxK
U2 - 10.1007/s10664-022-10120-x
DO - 10.1007/s10664-022-10120-x
M3 - Article
AN - SCOPUS:85134559954
VL - 27
JO - Empirical Software Engineering
JF - Empirical Software Engineering
SN - 1382-3256
IS - 6
M1 - 136
ER -