Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density

Kazuhiro Yamashita, Changyun Huang, Meiyappan Nagappan, Yasutaka Kamei, Audris Mockus, Ahmed E. Hassan, Naoyasu Ubayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: The redistribution of Java code into smaller and less complex files may be counterproductive.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages191-201
Number of pages11
ISBN (Electronic)9781509041275
DOIs
Publication statusPublished - Oct 12 2016
Event2nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2016 - Vienna, Austria
Duration: Aug 1 2016Aug 3 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016

Other

Other2nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2016
CountryAustria
CityVienna
Period8/1/168/3/16

Fingerprint

Defect density
Defects

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software
  • Safety, Risk, Reliability and Quality

Cite this

Yamashita, K., Huang, C., Nagappan, M., Kamei, Y., Mockus, A., Hassan, A. E., & Ubayashi, N. (2016). Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density. In Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016 (pp. 191-201). [7589799] (Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/QRS.2016.31

Thresholds for Size and Complexity Metrics : A Case Study from the Perspective of Defect Density. / Yamashita, Kazuhiro; Huang, Changyun; Nagappan, Meiyappan; Kamei, Yasutaka; Mockus, Audris; Hassan, Ahmed E.; Ubayashi, Naoyasu.

Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 191-201 7589799 (Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamashita, K, Huang, C, Nagappan, M, Kamei, Y, Mockus, A, Hassan, AE & Ubayashi, N 2016, Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density. in Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016., 7589799, Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016, Institute of Electrical and Electronics Engineers Inc., pp. 191-201, 2nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2016, Vienna, Austria, 8/1/16. https://doi.org/10.1109/QRS.2016.31
Yamashita K, Huang C, Nagappan M, Kamei Y, Mockus A, Hassan AE et al. Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density. In Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 191-201. 7589799. (Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016). https://doi.org/10.1109/QRS.2016.31
Yamashita, Kazuhiro ; Huang, Changyun ; Nagappan, Meiyappan ; Kamei, Yasutaka ; Mockus, Audris ; Hassan, Ahmed E. ; Ubayashi, Naoyasu. / Thresholds for Size and Complexity Metrics : A Case Study from the Perspective of Defect Density. Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 191-201 (Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016).
@inproceedings{7772c6bfe35340248200bfa479a4a1ef,
title = "Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density",
abstract = "Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: The redistribution of Java code into smaller and less complex files may be counterproductive.",
author = "Kazuhiro Yamashita and Changyun Huang and Meiyappan Nagappan and Yasutaka Kamei and Audris Mockus and Hassan, {Ahmed E.} and Naoyasu Ubayashi",
year = "2016",
month = "10",
day = "12",
doi = "10.1109/QRS.2016.31",
language = "English",
series = "Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "191--201",
booktitle = "Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016",
address = "United States",

}

TY - GEN

T1 - Thresholds for Size and Complexity Metrics

T2 - A Case Study from the Perspective of Defect Density

AU - Yamashita, Kazuhiro

AU - Huang, Changyun

AU - Nagappan, Meiyappan

AU - Kamei, Yasutaka

AU - Mockus, Audris

AU - Hassan, Ahmed E.

AU - Ubayashi, Naoyasu

PY - 2016/10/12

Y1 - 2016/10/12

N2 - Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: The redistribution of Java code into smaller and less complex files may be counterproductive.

AB - Practical guidelines on what code has better quality are in great demand. For example, it is reasonable to expect the most complex code to be buggy. Structuring code into reasonably sized files and classes also appears to be prudent. Many attempts to determine (or declare) risk thresholds for various code metrics have been made. In this paper we want to examine the applicability of such thresholds. Hence, we replicate a recently published technique for calculating metric thresholds to determine high-risk files based on code size (LOC and number of methods), and complexity (cyclomatic complexity and module interface coupling) using a very large set of open and closed source projects written primarily in Java. We relate the threshold-derived risk to (a) the probability that a file would have a defect, and (b) the defect density of the files in the high-risk group. We find that the probability of a file having a defect is higher in the very high-risk group with a few exceptions. This is particularly pronounced when using size thresholds. Surprisingly, the defect density was uniformly lower in the very high-risk group of files. Our results suggest that, as expected, less code is associated with fewer defects. However, the same amount of code in large and complex files was associated with fewer defects than when located in smaller and less complex files. Hence we conclude that risk thresholds for size and complexity metrics have to be used with caution if at all. Our findings have immediate practical implications: The redistribution of Java code into smaller and less complex files may be counterproductive.

UR - http://www.scopus.com/inward/record.url?scp=84995468893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84995468893&partnerID=8YFLogxK

U2 - 10.1109/QRS.2016.31

DO - 10.1109/QRS.2016.31

M3 - Conference contribution

AN - SCOPUS:84995468893

T3 - Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016

SP - 191

EP - 201

BT - Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -