Implementation of a high-speed and high-precision XML information retrieval system on relational databases

Kei Fujimoto, Toshiyuki Shimizu, Norimasa Terada, Kenji Hatano, Yu Suzuki, Toshiyuki Amagasa, Hiroko Kinutani, Masatoshi Yoshikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper describes an XML information retrieval system that we have developed. It is based on a vector space model, and implemented on top of XRel, a relational XML database system that has been developed in our research group. When a query is processed, a large number of fragments are retrieved, because a single XML document usually contains many XML fragments. Keeping all XML fragments degrades retrieval precision and increases query processing time, because some XML fragments are not appropriate as a query target. In existing methods, retrieval targets are manually selected by human experts when an XML collection is stored in the system. Such manual selection is not feasible when many kinds of XML documents are stored in the system. To cope with the problem we propose a method for automatically selecting document-centric fragments by introducing three measurements, namely, period ratio, number of different words, and empirical rules. By deleting inappropriate data-centric fragments from results of keyword query, we can improve the accuracy and performance of our system. Through performance evaluations, we confirmed the improvement of retrieval precision and query processing speed.

Original languageEnglish
Title of host publicationAdvances in XML Information Retrieval and Evaluation - 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Revised Selected Papers
PublisherSpringer Verlag
Pages254-267
Number of pages14
ISBN (Print)3540349626, 9783540349624
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005 - Dagstuhl Castle, Germany
Duration: Nov 28 2005Nov 30 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3977 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005
Country/TerritoryGermany
CityDagstuhl Castle
Period11/28/0511/30/05

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Implementation of a high-speed and high-precision XML information retrieval system on relational databases'. Together they form a unique fingerprint.

Cite this