Finding characteristic substrings from compressed texts

Shunsuke Inenaga, Hideo Bannai

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

5 被引用数 (Scopus)

抄録

Text mining from large scaled data is of great importance in computer sci- ence. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeat- ing substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel problem: given a compressed text and compressed pattern, compute the representative of the equiva- lence class of the pattern w.r.t. the text. We present algorithms that solve the above problems in time polynomial in the size of input compressed strings. The compression scheme we consider is straight line program (SLP) which has exponential compres- sion, and therefore our algorithms are more efficient than any algorithms that work on uncompressed strings.

本文言語英語
ホスト出版物のタイトルProceedings of the Prague Stringology Conference 2009
ページ40-54
ページ数15
出版ステータス出版済み - 12月 1 2009
イベントPrague Stringology Conference 2009, PSC 2009 - Prague, チェコ共和国
継続期間: 8月 31 20099月 2 2009

出版物シリーズ

名前Proceedings of the Prague Stringology Conference 2009

その他

その他Prague Stringology Conference 2009, PSC 2009
国/地域チェコ共和国
CityPrague
Period8/31/099/2/09

!!!All Science Journal Classification (ASJC) codes

  • 数学 (全般)

フィンガープリント

「Finding characteristic substrings from compressed texts」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル