抄録
In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
元の言語 | 英語 |
---|---|
ページ(範囲) | 166-174 |
ページ数 | 9 |
ジャーナル | IEEJ Transactions on Electrical and Electronic Engineering |
巻 | 10 |
発行部数 | 2 |
DOI | |
出版物ステータス | 出版済み - 3 1 2015 |
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
これを引用
Improvement of automatic Chinese text classification by combining multiple features. / Luo, Xi; Ohyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka.
:: IEEJ Transactions on Electrical and Electronic Engineering, 巻 10, 番号 2, 01.03.2015, p. 166-174.研究成果: ジャーナルへの寄稿 › 記事
}
TY - JOUR
T1 - Improvement of automatic Chinese text classification by combining multiple features
AU - Luo, Xi
AU - Ohyama, Wataru
AU - Wakabayashi, Tetsushi
AU - Kimura, Fumitaka
PY - 2015/3/1
Y1 - 2015/3/1
N2 - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
AB - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.
UR - http://www.scopus.com/inward/record.url?scp=84922722190&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84922722190&partnerID=8YFLogxK
U2 - 10.1002/tee.22049
DO - 10.1002/tee.22049
M3 - Article
AN - SCOPUS:84922722190
VL - 10
SP - 166
EP - 174
JO - IEEJ Transactions on Electrical and Electronic Engineering
JF - IEEJ Transactions on Electrical and Electronic Engineering
SN - 1931-4973
IS - 2
ER -