In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
|Number of pages||9|
|Journal||IEEJ Transactions on Electrical and Electronic Engineering|
|Publication status||Published - Mar 1 2015|
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering