In this paper, we investigate the effect of data cleansing techniques for code review analysis. We choose three open source software projects, Android, Chromium and OpenStack, then collect code review data from them. We perform two data cleansing techniques to the dataset. 1. remove bots from reviewers. 2. Correct review start and end time for reviewing time calculation. Then, we compare cleaning data and not cleaning data about each cleansing techniques and evaluate their effect. The results show both cleansing techniques effect to code review analysis, because 1. bots accounts for 19.4% in OpenStack review. 2. corrected reviewing time is significantly different from not corrected one. Additionally, we investigate a change of correlation coefficient of reviewers' experience and the reviewing time by performing both data cleansing techniques. The result shows cleansing to reviewers effect to the correlation.
|Translated title of the contribution||Investigating the Effect of Data Cleaning Techniques for Code Review Analysis|
|Number of pages||10|
|Publication status||Published - Apr 15 2017|