Impact of word segmentation errors on automatic Chinese text classification

Xi Luo, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

3 被引用数 (Scopus)

抄録

In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.

本文言語英語
ホスト出版物のタイトルProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
ページ271-275
ページ数5
DOI
出版ステータス出版済み - 2012
イベント10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, オーストラリア
継続期間: 3月 27 20123月 29 2012

出版物シリーズ

名前Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012

その他

その他10th IAPR International Workshop on Document Analysis Systems, DAS 2012
国/地域オーストラリア
CityGold Coast, QLD
Period3/27/123/29/12

!!!All Science Journal Classification (ASJC) codes

  • 制御およびシステム工学

フィンガープリント

「Impact of word segmentation errors on automatic Chinese text classification」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル