Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints

Yasuo Tabei, Akihiro Kishimoto, Masaaki Kotera, Yoshihiro Yamanishi

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

3 被引用数 (Scopus)

抄録

Analyzing functional interactions between small compounds and proteins is indispensable in genomic drug discovery. Since rich information on various compound-protein interactions is available in recent molecular databases, strong demands for making best use of such databases require to invent powerful methods to help us find new functional compoundprotein pairs on a large scale. We present the succinct interval-splitting tree algorithm (SITA) that efficiently performs similarity search in databases for compound-protein pairs with respect to both binary fingerprints and real-valued properties. SITA achieves both time and space efficiency by developing the data structure called interval-splitting trees, which enables to efficiently prune the useless portions of search space, and by incorporating the ideas behind wavelet tree, a succinct data structure to compactly represent trees. We experimentally test SITA on the ability to retrieve similar compound-protein pairs/substrate-product pairs for a query from large databases with over 200 million compoundprotein pairs/substrate-product pairs and show that SITA performs better than other possible approaches.

本文言語英語
ホスト出版物のタイトルKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
編集者Rajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy
出版社Association for Computing Machinery
ページ176-184
ページ数9
ISBN(電子版)9781450321747
DOI
出版ステータス出版済み - 8 11 2013
イベント19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, 米国
継続期間: 8 11 20138 14 2013

出版物シリーズ

名前Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Part F128815

その他

その他19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
国/地域米国
CityChicago
Period8/11/138/14/13

All Science Journal Classification (ASJC) codes

  • ソフトウェア
  • 情報システム

フィンガープリント

「Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル