Discovering characteristic expressions in literary works

Masayuki Takeda, Tetsuya Matsumoto, Tomoko Fukuda, Ichiro Nanri

研究成果: Contribution to journalArticle査読

7 被引用数 (Scopus)

抄録

We attempt to extract characteristic expressions from literary works. That is, given two collections of literary works, one of which is written by a particular author (positive examples) and the other by a different author (negative examples), the problem is to find expressions that appear frequently in the positive examples but which are seldom found in the negative examples. This is considered as a special case of the optimal pattern discovery from textual data, in which only the substring patterns are considered. One approach would be to create a list of text substrings sorted according to goodness, and to scrutinize the first part of the list by human efforts. Since there is no word boundary in Japanese texts, a substring is often a fragment of a word or phrase. A method to assist domain experts who are involved in this task is a key problem. In this paper, we propose partitioning the text substrings into equivalence classes under an equivalence relation on strings, originally defined by Blumer et al. (J. ACM 34(3) (1987) 578). The equivalence relation has the desirable property that all members of each equivalence class necessarily have a unique goodness value. This idea effectively reduces the inefficiency of the task of evaluating mined patterns. We also present a method for browsing possible superstrings of a focused string as well as its context. We report successful results with two pairs of anthologies of classical Japanese poems. We expect that the extracted expressions may lead to discovering overlooked aspects of individual poets.

本文言語英語
ページ(範囲)525-546
ページ数22
ジャーナルTheoretical Computer Science
292
2
DOI
出版ステータス出版済み - 1 27 2003

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics

フィンガープリント 「Discovering characteristic expressions in literary works」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル