A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering

Tianxiang He, Chansu Han, Ryoichi Isawa, Takeshi Takahashi, Shuji Kijima, Jun’ichi Takeuchi, Koji Nakao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires (N2+N)/2 compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than (N2+N)/2. With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.

Original languageEnglish
Title of host publicationNeural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
EditorsTom Gedeon, Kok Wai Wong, Minho Lee
PublisherSpringer
Pages766-778
Number of pages13
ISBN (Print)9783030367077
DOIs
Publication statusPublished - 2019
Event26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, Australia
Duration: Dec 12 2019Dec 15 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11953 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th International Conference on Neural Information Processing, ICONIP 2019
CountryAustralia
CitySydney
Period12/12/1912/15/19

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering'. Together they form a unique fingerprint.

Cite this