Mining revision histories to detect cross-language clones without intermediates

Xiao Cheng, Zhiming Peng, Lingxiao Jiang, Hao Zhong, Haibo Yu, Jianjun Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

To attract more users on different platforms, many projects release their versions in multiple programming languages (e.g., Java and C#). They typically have many code snippets that implement similar functionalities, i.e., cross-language clones. Programmers often need to track and modify crosslanguage clones consistently to maintain similar functionalities across different language implementations. In literature, researchers have proposed approaches to detect crosslanguage clones, mostly for languages that share a common intermediate language (such as the .NET language family) so that techniques for detecting single-language clones can be applied. As a result, those approaches cannot detect cross-language clones for many projects that are not implemented in a .NET language. To overcome the limitation, in this paper, we propose a novel approach, CLCMiner, that detects cross-language clones automatically without the need of an intermediate language. Our approach mines such clones from revision histories, which reect how programmers maintain cross-language clones in practice. We have implemented a prototype tool for our approach and conducted an evaluation on five open source projects that have versions in Java and C#. The results show that CLCMiner achieves high accuracy and point to promising future work.

Original languageEnglish
Title of host publicationASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
EditorsSarfraz Khurshid, David Lo, Sven Apel
PublisherAssociation for Computing Machinery, Inc
Pages696-701
Number of pages6
ISBN (Electronic)9781450338455
DOIs
Publication statusPublished - Aug 25 2016
Event31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016 - Singapore, Singapore
Duration: Sep 3 2016Sep 7 2016

Publication series

NameASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Other

Other31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016
CountrySingapore
CitySingapore
Period9/3/169/7/16

Fingerprint

Java programming language

All Science Journal Classification (ASJC) codes

  • Software
  • Computational Theory and Mathematics
  • Human-Computer Interaction

Cite this

Cheng, X., Peng, Z., Jiang, L., Zhong, H., Yu, H., & Zhao, J. (2016). Mining revision histories to detect cross-language clones without intermediates. In S. Khurshid, D. Lo, & S. Apel (Eds.), ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (pp. 696-701). (ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering). Association for Computing Machinery, Inc. https://doi.org/10.1145/2970276.2970363

Mining revision histories to detect cross-language clones without intermediates. / Cheng, Xiao; Peng, Zhiming; Jiang, Lingxiao; Zhong, Hao; Yu, Haibo; Zhao, Jianjun.

ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ed. / Sarfraz Khurshid; David Lo; Sven Apel. Association for Computing Machinery, Inc, 2016. p. 696-701 (ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cheng, X, Peng, Z, Jiang, L, Zhong, H, Yu, H & Zhao, J 2016, Mining revision histories to detect cross-language clones without intermediates. in S Khurshid, D Lo & S Apel (eds), ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Association for Computing Machinery, Inc, pp. 696-701, 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, Singapore, 9/3/16. https://doi.org/10.1145/2970276.2970363
Cheng X, Peng Z, Jiang L, Zhong H, Yu H, Zhao J. Mining revision histories to detect cross-language clones without intermediates. In Khurshid S, Lo D, Apel S, editors, ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, Inc. 2016. p. 696-701. (ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering). https://doi.org/10.1145/2970276.2970363
Cheng, Xiao ; Peng, Zhiming ; Jiang, Lingxiao ; Zhong, Hao ; Yu, Haibo ; Zhao, Jianjun. / Mining revision histories to detect cross-language clones without intermediates. ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. editor / Sarfraz Khurshid ; David Lo ; Sven Apel. Association for Computing Machinery, Inc, 2016. pp. 696-701 (ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering).
@inproceedings{f7571ede2b984dabb844fd50b26f6fd9,
title = "Mining revision histories to detect cross-language clones without intermediates",
abstract = "To attract more users on different platforms, many projects release their versions in multiple programming languages (e.g., Java and C#). They typically have many code snippets that implement similar functionalities, i.e., cross-language clones. Programmers often need to track and modify crosslanguage clones consistently to maintain similar functionalities across different language implementations. In literature, researchers have proposed approaches to detect crosslanguage clones, mostly for languages that share a common intermediate language (such as the .NET language family) so that techniques for detecting single-language clones can be applied. As a result, those approaches cannot detect cross-language clones for many projects that are not implemented in a .NET language. To overcome the limitation, in this paper, we propose a novel approach, CLCMiner, that detects cross-language clones automatically without the need of an intermediate language. Our approach mines such clones from revision histories, which reect how programmers maintain cross-language clones in practice. We have implemented a prototype tool for our approach and conducted an evaluation on five open source projects that have versions in Java and C#. The results show that CLCMiner achieves high accuracy and point to promising future work.",
author = "Xiao Cheng and Zhiming Peng and Lingxiao Jiang and Hao Zhong and Haibo Yu and Jianjun Zhao",
year = "2016",
month = "8",
day = "25",
doi = "10.1145/2970276.2970363",
language = "English",
series = "ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering",
publisher = "Association for Computing Machinery, Inc",
pages = "696--701",
editor = "Sarfraz Khurshid and David Lo and Sven Apel",
booktitle = "ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering",

}

TY - GEN

T1 - Mining revision histories to detect cross-language clones without intermediates

AU - Cheng, Xiao

AU - Peng, Zhiming

AU - Jiang, Lingxiao

AU - Zhong, Hao

AU - Yu, Haibo

AU - Zhao, Jianjun

PY - 2016/8/25

Y1 - 2016/8/25

N2 - To attract more users on different platforms, many projects release their versions in multiple programming languages (e.g., Java and C#). They typically have many code snippets that implement similar functionalities, i.e., cross-language clones. Programmers often need to track and modify crosslanguage clones consistently to maintain similar functionalities across different language implementations. In literature, researchers have proposed approaches to detect crosslanguage clones, mostly for languages that share a common intermediate language (such as the .NET language family) so that techniques for detecting single-language clones can be applied. As a result, those approaches cannot detect cross-language clones for many projects that are not implemented in a .NET language. To overcome the limitation, in this paper, we propose a novel approach, CLCMiner, that detects cross-language clones automatically without the need of an intermediate language. Our approach mines such clones from revision histories, which reect how programmers maintain cross-language clones in practice. We have implemented a prototype tool for our approach and conducted an evaluation on five open source projects that have versions in Java and C#. The results show that CLCMiner achieves high accuracy and point to promising future work.

AB - To attract more users on different platforms, many projects release their versions in multiple programming languages (e.g., Java and C#). They typically have many code snippets that implement similar functionalities, i.e., cross-language clones. Programmers often need to track and modify crosslanguage clones consistently to maintain similar functionalities across different language implementations. In literature, researchers have proposed approaches to detect crosslanguage clones, mostly for languages that share a common intermediate language (such as the .NET language family) so that techniques for detecting single-language clones can be applied. As a result, those approaches cannot detect cross-language clones for many projects that are not implemented in a .NET language. To overcome the limitation, in this paper, we propose a novel approach, CLCMiner, that detects cross-language clones automatically without the need of an intermediate language. Our approach mines such clones from revision histories, which reect how programmers maintain cross-language clones in practice. We have implemented a prototype tool for our approach and conducted an evaluation on five open source projects that have versions in Java and C#. The results show that CLCMiner achieves high accuracy and point to promising future work.

UR - http://www.scopus.com/inward/record.url?scp=84989187430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989187430&partnerID=8YFLogxK

U2 - 10.1145/2970276.2970363

DO - 10.1145/2970276.2970363

M3 - Conference contribution

AN - SCOPUS:84989187430

T3 - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

SP - 696

EP - 701

BT - ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

A2 - Khurshid, Sarfraz

A2 - Lo, David

A2 - Apel, Sven

PB - Association for Computing Machinery, Inc

ER -