Automatic topic classification of test cases using text mining at an Android smartphone vendor

Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.

Original languageEnglish
Title of host publicationProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018
PublisherIEEE Computer Society
ISBN (Electronic)9781450358231
DOIs
Publication statusPublished - Oct 11 2018
Event12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018 - Oulu, Finland
Duration: Oct 11 2018Oct 12 2018

Publication series

NameInternational Symposium on Empirical Software Engineering and Measurement
ISSN (Print)1949-3770
ISSN (Electronic)1949-3789

Conference

Conference12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018
CountryFinland
CityOulu
Period10/11/1810/12/18

Fingerprint

Smartphones
Labels
Android (operating system)
Application programs
Ecosystems
Labeling
Learning systems
Industry

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Cite this

Shimagaki, J., Kamei, Y., Ubayashi, N., & Hindle, A. (2018). Automatic topic classification of test cases using text mining at an Android smartphone vendor. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018 [a32] (International Symposium on Empirical Software Engineering and Measurement). IEEE Computer Society. https://doi.org/10.1145/3239235.3268927

Automatic topic classification of test cases using text mining at an Android smartphone vendor. / Shimagaki, Junji; Kamei, Yasutaka; Ubayashi, Naoyasu; Hindle, Abram.

Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018. IEEE Computer Society, 2018. a32 (International Symposium on Empirical Software Engineering and Measurement).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shimagaki, J, Kamei, Y, Ubayashi, N & Hindle, A 2018, Automatic topic classification of test cases using text mining at an Android smartphone vendor. in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018., a32, International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, Oulu, Finland, 10/11/18. https://doi.org/10.1145/3239235.3268927
Shimagaki J, Kamei Y, Ubayashi N, Hindle A. Automatic topic classification of test cases using text mining at an Android smartphone vendor. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018. IEEE Computer Society. 2018. a32. (International Symposium on Empirical Software Engineering and Measurement). https://doi.org/10.1145/3239235.3268927
Shimagaki, Junji ; Kamei, Yasutaka ; Ubayashi, Naoyasu ; Hindle, Abram. / Automatic topic classification of test cases using text mining at an Android smartphone vendor. Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018. IEEE Computer Society, 2018. (International Symposium on Empirical Software Engineering and Measurement).
@inproceedings{0832438bc64d4c609c6e0d4793b274ff,
title = "Automatic topic classification of test cases using text mining at an Android smartphone vendor",
abstract = "Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify {"}what features and apps were tested and verified?{"}. This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called {"}feature labels (FLs){"}, and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.",
author = "Junji Shimagaki and Yasutaka Kamei and Naoyasu Ubayashi and Abram Hindle",
year = "2018",
month = "10",
day = "11",
doi = "10.1145/3239235.3268927",
language = "English",
series = "International Symposium on Empirical Software Engineering and Measurement",
publisher = "IEEE Computer Society",
booktitle = "Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018",
address = "United States",

}

TY - GEN

T1 - Automatic topic classification of test cases using text mining at an Android smartphone vendor

AU - Shimagaki, Junji

AU - Kamei, Yasutaka

AU - Ubayashi, Naoyasu

AU - Hindle, Abram

PY - 2018/10/11

Y1 - 2018/10/11

N2 - Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.

AB - Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.

UR - http://www.scopus.com/inward/record.url?scp=85061480547&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061480547&partnerID=8YFLogxK

U2 - 10.1145/3239235.3268927

DO - 10.1145/3239235.3268927

M3 - Conference contribution

AN - SCOPUS:85061480547

T3 - International Symposium on Empirical Software Engineering and Measurement

BT - Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018

PB - IEEE Computer Society

ER -