Which Metrics Should Researchers Use to Collect Repositories: An Empirical Study

Kai Yamamoto, Masanari Kondo, Kinari Nishiura, Osamu Mizuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

GitHub is a huge publicly available development platform for hosting a version control system based on Git; software developers prefer to host their various software projects in GitHub. Therefore researchers who are interested in mining software repository frequently use GitHub to collect software projects as datasets. GitHub provides us with repository metrics such as popularity, contribution, and interest. We believe that such metrics are related to the quality of software; we use them to opt for studied repositories according to our research purpose. However, to the best of our knowledge, nobody has any evidence to support this assumption.Our main purpose is to provide researchers who study software quality, especially issue management, with repository metrics to select appropriate repositories for their studies. In this paper, we study the relationship between the characteristics of the issue pages of repositories that are selected by repository metrics in order to figure out the best repository metric to select proper repositories. The following findings are the highlights of our study: (1) The number of contributors that indicates the number of developers who contribute to a GitHub repository can be used to select the repositories having issue pages that are well-maintained. More specifically, such issue pages include more issues and in which developers use the labels more frequently rather than those that are selected by other metrics. (2) The number of dependencies opts for the repositories that have fewer issues and in which developers use the labels less often rather than those that are selected by other metrics.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 20th International Conference on Software Quality, Reliability, and Security, QRS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages458-466
Number of pages9
ISBN (Electronic)9781728189130
DOIs
Publication statusPublished - Dec 2020
Externally publishedYes
Event20th IEEE International Conference on Software Quality, Reliability, and Security, QRS 2020 - Macau, China
Duration: Dec 11 2020Dec 14 2020

Publication series

NameProceedings - 2020 IEEE 20th International Conference on Software Quality, Reliability, and Security, QRS 2020

Conference

Conference20th IEEE International Conference on Software Quality, Reliability, and Security, QRS 2020
CountryChina
CityMacau
Period12/11/2012/14/20

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Safety, Risk, Reliability and Quality
  • Modelling and Simulation
  • Software

Fingerprint Dive into the research topics of 'Which Metrics Should Researchers Use to Collect Repositories: An Empirical Study'. Together they form a unique fingerprint.

Cite this