Implementation of a high-speed and high-precision XML information retrieval system on relational databases

Kei Fujimoto, Toshiyuki Shimizu, Norimasa Terada, Kenji Hatano, Yu Suzuki, Toshiyuki Amagasa, Hiroko Kinutani, Masatoshi Yoshikawa

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

6 被引用数 (Scopus)

抄録

This paper describes an XML information retrieval system that we have developed. It is based on a vector space model, and implemented on top of XRel, a relational XML database system that has been developed in our research group. When a query is processed, a large number of fragments are retrieved, because a single XML document usually contains many XML fragments. Keeping all XML fragments degrades retrieval precision and increases query processing time, because some XML fragments are not appropriate as a query target. In existing methods, retrieval targets are manually selected by human experts when an XML collection is stored in the system. Such manual selection is not feasible when many kinds of XML documents are stored in the system. To cope with the problem we propose a method for automatically selecting document-centric fragments by introducing three measurements, namely, period ratio, number of different words, and empirical rules. By deleting inappropriate data-centric fragments from results of keyword query, we can improve the accuracy and performance of our system. Through performance evaluations, we confirmed the improvement of retrieval precision and query processing speed.

本文言語英語
ホスト出版物のタイトルAdvances in XML Information Retrieval and Evaluation - 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Revised Selected Papers
出版社Springer Verlag
ページ254-267
ページ数14
ISBN(印刷版)3540349626, 9783540349624
DOI
出版ステータス出版済み - 2006
外部発表はい
イベント4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005 - Dagstuhl Castle, ドイツ
継続期間: 11月 28 200511月 30 2005

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
3977 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

会議

会議4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005
国/地域ドイツ
CityDagstuhl Castle
Period11/28/0511/30/05

!!!All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Implementation of a high-speed and high-precision XML information retrieval system on relational databases」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル