Automated duplicate bug report detection using multi-factor analysis

Jie Zou, Ling Xu, Mengning Yang, Xiaohong Zhang, Jun Zeng, Sachio Hirokawa

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.

Original languageEnglish
Pages (from-to)1762-1775
Number of pages14
JournalIEICE Transactions on Information and Systems
VolumeE99D
Issue number7
DOIs
Publication statusPublished - Jul 2016

Fingerprint

Factor analysis
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Automated duplicate bug report detection using multi-factor analysis. / Zou, Jie; Xu, Ling; Yang, Mengning; Zhang, Xiaohong; Zeng, Jun; Hirokawa, Sachio.

In: IEICE Transactions on Information and Systems, Vol. E99D, No. 7, 07.2016, p. 1762-1775.

Research output: Contribution to journalArticle

Zou, Jie ; Xu, Ling ; Yang, Mengning ; Zhang, Xiaohong ; Zeng, Jun ; Hirokawa, Sachio. / Automated duplicate bug report detection using multi-factor analysis. In: IEICE Transactions on Information and Systems. 2016 ; Vol. E99D, No. 7. pp. 1762-1775.
@article{cc7209c1364c43628b6835a4a6dad3a1,
title = "Automated duplicate bug report detection using multi-factor analysis",
abstract = "The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96{\%}-10.53{\%} compared to the state-of-art approach DBTM.",
author = "Jie Zou and Ling Xu and Mengning Yang and Xiaohong Zhang and Jun Zeng and Sachio Hirokawa",
year = "2016",
month = "7",
doi = "10.1587/transinf.2016EDP7052",
language = "English",
volume = "E99D",
pages = "1762--1775",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "一般社団法人電子情報通信学会",
number = "7",

}

TY - JOUR

T1 - Automated duplicate bug report detection using multi-factor analysis

AU - Zou, Jie

AU - Xu, Ling

AU - Yang, Mengning

AU - Zhang, Xiaohong

AU - Zeng, Jun

AU - Hirokawa, Sachio

PY - 2016/7

Y1 - 2016/7

N2 - The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.

AB - The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model Ngram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the Ngram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluationmetric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.

UR - http://www.scopus.com/inward/record.url?scp=84976906515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976906515&partnerID=8YFLogxK

U2 - 10.1587/transinf.2016EDP7052

DO - 10.1587/transinf.2016EDP7052

M3 - Article

AN - SCOPUS:84976906515

VL - E99D

SP - 1762

EP - 1775

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 7

ER -