TY - JOUR
T1 - Can we benchmark Code Review studies? A systematic mapping study of methodology, dataset, and metric
AU - Wang, Dong
AU - Ueda, Yuki
AU - Kula, Raula Gaikovina
AU - Ishio, Takashi
AU - Matsumoto, Kenichi
N1 - Funding Information:
This work has been supported by JSPS KAKENHI, Japan Grant Numbers JP18H04094 , JP20K19774 , JP20J15163 , and JP20H05706 .
Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/10
Y1 - 2021/10
N2 - Context: Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. Objective: This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies. Methods: A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed. Results: First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets. Conclusion: We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.
AB - Context: Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. Objective: This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies. Methods: A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed. Results: First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets. Conclusion: We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.
UR - http://www.scopus.com/inward/record.url?scp=85107587623&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107587623&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2021.111009
DO - 10.1016/j.jss.2021.111009
M3 - Article
AN - SCOPUS:85107587623
VL - 180
JO - Journal of Systems and Software
JF - Journal of Systems and Software
SN - 0164-1212
M1 - 111009
ER -