Automatic chinese text classification using character-based and word-based approach

Xi Luo, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

研究成果: ジャーナルへの寄稿Conference article

1 引用 (Scopus)

抜粋

In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.

元の言語英語
記事番号6628638
ページ(範囲)329-333
ページ数5
ジャーナルProceedings of the International Conference on Document Analysis and Recognition, ICDAR
DOI
出版物ステータス出版済み - 12 11 2013
外部発表Yes
イベント12th International Conference on Document Analysis and Recognition, ICDAR 2013 - Washington, DC, 米国
継続期間: 8 25 20138 28 2013

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

これを引用