Improvement of automatic Chinese text classification by combining multiple features

Xi Luo, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

Original languageEnglish
Pages (from-to)166-174
Number of pages9
JournalIEEJ Transactions on Electrical and Electronic Engineering
Volume10
Issue number2
DOIs
Publication statusPublished - Mar 1 2015

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this