Is lines of code a good measure of effort in effort-aware models?

Emad Shihab, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required.

Original languageEnglish
Pages (from-to)1981-1993
Number of pages13
JournalInformation and Software Technology
Volume55
Issue number11
DOIs
Publication statusPublished - Nov 1 2013

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications

Cite this

Is lines of code a good measure of effort in effort-aware models? / Shihab, Emad; Kamei, Yasutaka; Adams, Bram; Hassan, Ahmed E.

In: Information and Software Technology, Vol. 55, No. 11, 01.11.2013, p. 1981-1993.

Research output: Contribution to journalArticle

Shihab, Emad ; Kamei, Yasutaka ; Adams, Bram ; Hassan, Ahmed E. / Is lines of code a good measure of effort in effort-aware models?. In: Information and Software Technology. 2013 ; Vol. 55, No. 11. pp. 1981-1993.
@article{4eb58e5dbe904a5ba75b9ba9e0040d85,
title = "Is lines of code a good measure of effort in effort-aware models?",
abstract = "Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66{\%}. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required.",
author = "Emad Shihab and Yasutaka Kamei and Bram Adams and Hassan, {Ahmed E.}",
year = "2013",
month = "11",
day = "1",
doi = "10.1016/j.infsof.2013.06.002",
language = "English",
volume = "55",
pages = "1981--1993",
journal = "Information and Software Technology",
issn = "0950-5849",
publisher = "Elsevier",
number = "11",

}

TY - JOUR

T1 - Is lines of code a good measure of effort in effort-aware models?

AU - Shihab, Emad

AU - Kamei, Yasutaka

AU - Adams, Bram

AU - Hassan, Ahmed E.

PY - 2013/11/1

Y1 - 2013/11/1

N2 - Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required.

AB - Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required.

UR - http://www.scopus.com/inward/record.url?scp=84884148406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84884148406&partnerID=8YFLogxK

U2 - 10.1016/j.infsof.2013.06.002

DO - 10.1016/j.infsof.2013.06.002

M3 - Article

VL - 55

SP - 1981

EP - 1993

JO - Information and Software Technology

JF - Information and Software Technology

SN - 0950-5849

IS - 11

ER -