Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints

Yasuo Tabei, Akihiro Kishimoto, Masaaki Kotera, Yoshihiro Yamanishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Analyzing functional interactions between small compounds and proteins is indispensable in genomic drug discovery. Since rich information on various compound-protein interactions is available in recent molecular databases, strong demands for making best use of such databases require to invent powerful methods to help us find new functional compoundprotein pairs on a large scale. We present the succinct interval-splitting tree algorithm (SITA) that efficiently performs similarity search in databases for compound-protein pairs with respect to both binary fingerprints and real-valued properties. SITA achieves both time and space efficiency by developing the data structure called interval-splitting trees, which enables to efficiently prune the useless portions of search space, and by incorporating the ideas behind wavelet tree, a succinct data structure to compactly represent trees. We experimentally test SITA on the ability to retrieve similar compound-protein pairs/substrate-product pairs for a query from large databases with over 200 million compoundprotein pairs/substrate-product pairs and show that SITA performs better than other possible approaches.

Original languageEnglish
Title of host publicationKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsRajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy
PublisherAssociation for Computing Machinery
Pages176-184
Number of pages9
ISBN (Electronic)9781450321747
DOIs
Publication statusPublished - Aug 11 2013
Event19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States
Duration: Aug 11 2013Aug 14 2013

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F128815

Other

Other19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
CountryUnited States
CityChicago
Period8/11/138/14/13

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints'. Together they form a unique fingerprint.

Cite this