Improvement of automatic Chinese text classification by combining multiple features

Xi Luo, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura

研究成果: ジャーナルへの寄稿記事

1 引用 (Scopus)

抄録

In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

元の言語英語
ページ(範囲)166-174
ページ数9
ジャーナルIEEJ Transactions on Electrical and Electronic Engineering
10
発行部数2
DOI
出版物ステータス出版済み - 3 1 2015

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

これを引用

Improvement of automatic Chinese text classification by combining multiple features. / Luo, Xi; Ohyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka.

:: IEEJ Transactions on Electrical and Electronic Engineering, 巻 10, 番号 2, 01.03.2015, p. 166-174.

研究成果: ジャーナルへの寄稿記事

Luo, Xi ; Ohyama, Wataru ; Wakabayashi, Tetsushi ; Kimura, Fumitaka. / Improvement of automatic Chinese text classification by combining multiple features. :: IEEJ Transactions on Electrical and Electronic Engineering. 2015 ; 巻 10, 番号 2. pp. 166-174.
@article{cff2ad3eee14480b9d6c4ec2744573b4,
title = "Improvement of automatic Chinese text classification by combining multiple features",
abstract = "In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.",
author = "Xi Luo and Wataru Ohyama and Tetsushi Wakabayashi and Fumitaka Kimura",
year = "2015",
month = "3",
day = "1",
doi = "10.1002/tee.22049",
language = "English",
volume = "10",
pages = "166--174",
journal = "IEEJ Transactions on Electrical and Electronic Engineering",
issn = "1931-4973",
publisher = "John Wiley and Sons Inc.",
number = "2",

}

TY - JOUR

T1 - Improvement of automatic Chinese text classification by combining multiple features

AU - Luo, Xi

AU - Ohyama, Wataru

AU - Wakabayashi, Tetsushi

AU - Kimura, Fumitaka

PY - 2015/3/1

Y1 - 2015/3/1

N2 - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

AB - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

UR - http://www.scopus.com/inward/record.url?scp=84922722190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922722190&partnerID=8YFLogxK

U2 - 10.1002/tee.22049

DO - 10.1002/tee.22049

M3 - Article

AN - SCOPUS:84922722190

VL - 10

SP - 166

EP - 174

JO - IEEJ Transactions on Electrical and Electronic Engineering

JF - IEEJ Transactions on Electrical and Electronic Engineering

SN - 1931-4973

IS - 2

ER -