A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering

Tianxiang He, Chansu Han, Ryoichi Isawa, Takeshi Takahashi, Shuji Kijima, Jun’ichi Takeuchi, Koji Nakao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires (N2+N)/2 compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than (N2+N)/2. With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.

Original languageEnglish
Title of host publicationNeural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
EditorsTom Gedeon, Kok Wai Wong, Minho Lee
PublisherSpringer
Pages766-778
Number of pages13
ISBN (Print)9783030367077
DOIs
Publication statusPublished - Jan 1 2019
Event26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia
Duration: Dec 12 2019Dec 15 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11953 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Neural Information Processing, ICONIP 2019
CountryAustralia
CitySydney
Period12/12/1912/15/19

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

He, T., Han, C., Isawa, R., Takahashi, T., Kijima, S., Takeuchi, J., & Nakao, K. (2019). A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering. In T. Gedeon, K. W. Wong, & M. Lee (Eds.), Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings (pp. 766-778). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11953 LNCS). Springer. https://doi.org/10.1007/978-3-030-36708-4_63