コードレビュー分析におけるデータクレンジングの影響調査

Translated title of the contribution: Investigating the Effect of Data Cleaning Techniques for Code Review Analysis

戸田 航史, 亀井 靖高, 吉田 則裕

Research output: Contribution to journalArticle

Abstract

In this paper, we investigate the effect of data cleansing techniques for code review analysis. We choose three open source software projects, Android, Chromium and OpenStack, then collect code review data from them. We perform two data cleansing techniques to the dataset. 1. remove bots from reviewers. 2. Correct review start and end time for reviewing time calculation. Then, we compare cleaning data and not cleaning data about each cleansing techniques and evaluate their effect. The results show both cleansing techniques effect to code review analysis, because 1. bots accounts for 19.4% in OpenStack review. 2. corrected reviewing time is significantly different from not corrected one. Additionally, we investigate a change of correlation coefficient of reviewers' experience and the reviewing time by performing both data cleansing techniques. The result shows cleansing to reviewers effect to the correlation.
Translated title of the contributionInvestigating the Effect of Data Cleaning Techniques for Code Review Analysis
Original languageJapanese
Pages (from-to)845-854
Number of pages10
Journal情報処理学会論文誌
Volume58
Issue number4
Publication statusPublished - Apr 15 2017

Fingerprint Dive into the research topics of 'Investigating the Effect of Data Cleaning Techniques for Code Review Analysis'. Together they form a unique fingerprint.

Cite this