TY - JOUR
T1 - Improvement of automatic Chinese text classification by combining multiple features
AU - Luo, Xi
AU - Ohyama, Wataru
AU - Wakabayashi, Tetsushi
AU - Kimura, Fumitaka
N1 - Publisher Copyright:
© 2014 Institute of Electrical Engineers of Japan.
PY - 2015/3/1
Y1 - 2015/3/1
N2 - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
AB - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
UR - http://www.scopus.com/inward/record.url?scp=84922722190&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84922722190&partnerID=8YFLogxK
U2 - 10.1002/tee.22049
DO - 10.1002/tee.22049
M3 - Article
AN - SCOPUS:84922722190
VL - 10
SP - 166
EP - 174
JO - IEEJ Transactions on Electrical and Electronic Engineering
JF - IEEJ Transactions on Electrical and Electronic Engineering
SN - 1931-4973
IS - 2
ER -