A study on automatic Chinese text classification

Xi Luo, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

In this paper, we perform Chinese text classification using N-gram (uni-gram, bi-gram and mixed uni-gram/bi-gram) frequency feature instead of word frequency feature to represent documents and propose the use of mixed uni-gram/bi-gram after feature transformation. We further propose a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification. Furthermore, we present several experiments evaluating the selection of features based on part-of-speech analysis and the results show that suitable combination of part-of-speech can lead to better classification performance.

Original languageEnglish
Title of host publicationProceedings - 11th International Conference on Document Analysis and Recognition, ICDAR 2011
Pages920-924
Number of pages5
DOIs
Publication statusPublished - Dec 2 2011
Event11th International Conference on Document Analysis and Recognition, ICDAR 2011 - Beijing, China
Duration: Sep 18 2011Sep 21 2011

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
ISSN (Print)1520-5363

Other

Other11th International Conference on Document Analysis and Recognition, ICDAR 2011
CountryChina
CityBeijing
Period9/18/119/21/11

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'A study on automatic Chinese text classification'. Together they form a unique fingerprint.

  • Cite this

    Luo, X., Oyama, W., Wakabayashi, T., & Kimura, F. (2011). A study on automatic Chinese text classification. In Proceedings - 11th International Conference on Document Analysis and Recognition, ICDAR 2011 (pp. 920-924). [6065445] (Proceedings of the International Conference on Document Analysis and Recognition, ICDAR). https://doi.org/10.1109/ICDAR.2011.187