Mining pure patterns in texts

Yasuhiro Yamada, Tetsuya Nakatoh, Kensuke Baba, Daisuke Ikeda

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

3 被引用数 (Scopus)

抄録

We herein investigate finding unusual patterns from a given string as a text. In the present paper, the pattern is expressed as a substring of the string. The natural assumption with respect to the frequency of a pattern is that the shorter the length of the pattern, the larger the frequency of the pattern. We define a pattern to be pure if the frequencies of all of the substrings of the pattern are the same as the frequency of the pattern. This means that the substrings appear only within the pattern in the string. This condition is in contrast to the natural assumption. The present paper proposes three statistics for quantifying the purity of a pattern, i.e., probability, entropy, and difference, which are calculated based on the frequency of the pattern and its substrings. Experiments using DNA sequences reveal that patterns with large probability correspond to the features of the sequences.

本文言語英語
ホスト出版物のタイトルProceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012
ページ285-290
ページ数6
DOI
出版ステータス出版済み - 2012
イベント1st IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012 - Fukuoka, 日本
継続期間: 9 20 20129 22 2012

出版物シリーズ

名前Proceedings of the 2012 IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012

その他

その他1st IIAI International Conference on Advanced Applied Informatics, IIAIAAI 2012
国/地域日本
CityFukuoka
Period9/20/129/22/12

All Science Journal Classification (ASJC) codes

  • 情報システム

フィンガープリント

「Mining pure patterns in texts」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル