A dynamic-static approach of model fusion for document similarity computation

Jiyi Li, Yasuhito Asano, Toshiyuki Shimizu, Masatoshi Yoshikawa

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

抄録

The semantic similarity of text document pairs can be used for valuable applications. There are various existing basic models proposed for representing document content and computing document similarity. Each basic model performs difference in different scenarios. Existing model selection or fusion approaches generate improved models based on these basic models on the granularity of document collection. These improved models are static for all document pairs and may be only proper for some of the document pairs. We propose a dynamic idea of model fusion, and an approach based on a Dynamic-Static Fusion Model (DSFM) on the granularity of document pairs, which is dynamic for each document pair. The dynamic module in DSFM learns to rank the basic models to predict the best basic model for a given document pair. We propose a model categorization method to construct ideal model labels of document pairs for learning in this dynamic module. The static module in DSFM is based on linear regression. We also propose a model selection method to select appropriate candidate basic models for fusion and improve the performance. The experiments on public document collections which contain paragraph pairs and sentence pairs with human-rated similarity illustrate the effectiveness of our approach.

本文言語英語
ホスト出版物のタイトルWeb Information Systems Engineering – WISE 2015 - 16th International Conference, Proceedings
編集者Shu-Ching Chen, Tao Li, Hua Wang, Yanchun Zhang, Wojciech Cellary, Dingding Wang, Wojciech Cellary, Shu-Ching Chen, Tao Li, Dingding Wang, Jianyong Wang, Jianyong Wang, Hua Wang, Yanchun Zhang
出版社Springer Verlag
ページ353-368
ページ数16
ISBN(印刷版)9783319261898, 9783319261898
DOI
出版ステータス出版済み - 2015
外部発表はい
イベント16th International Conference on Web Information Systems Engineering, WISE 2015 - Miami, 米国
継続期間: 11 1 201511 3 2015

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9418
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

会議

会議16th International Conference on Web Information Systems Engineering, WISE 2015
国/地域米国
CityMiami
Period11/1/1511/3/15

All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「A dynamic-static approach of model fusion for document similarity computation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル