Improvement of automatic Chinese text classification by combining multiple features

Xi Luo, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

Original languageEnglish
Pages (from-to)166-174
Number of pages9
JournalIEEJ Transactions on Electrical and Electronic Engineering
Volume10
Issue number2
DOIs
Publication statusPublished - Mar 1 2015

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this

Improvement of automatic Chinese text classification by combining multiple features. / Luo, Xi; Oyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka.

In: IEEJ Transactions on Electrical and Electronic Engineering, Vol. 10, No. 2, 01.03.2015, p. 166-174.

Research output: Contribution to journalArticle

Luo, Xi ; Oyama, Wataru ; Wakabayashi, Tetsushi ; Kimura, Fumitaka. / Improvement of automatic Chinese text classification by combining multiple features. In: IEEJ Transactions on Electrical and Electronic Engineering. 2015 ; Vol. 10, No. 2. pp. 166-174.
@article{cff2ad3eee14480b9d6c4ec2744573b4,
title = "Improvement of automatic Chinese text classification by combining multiple features",
abstract = "In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.",
author = "Xi Luo and Wataru Oyama and Tetsushi Wakabayashi and Fumitaka Kimura",
year = "2015",
month = "3",
day = "1",
doi = "10.1002/tee.22049",
language = "English",
volume = "10",
pages = "166--174",
journal = "IEEJ Transactions on Electrical and Electronic Engineering",
issn = "1931-4973",
publisher = "John Wiley and Sons Inc.",
number = "2",

}

TY - JOUR

T1 - Improvement of automatic Chinese text classification by combining multiple features

AU - Luo, Xi

AU - Oyama, Wataru

AU - Wakabayashi, Tetsushi

AU - Kimura, Fumitaka

PY - 2015/3/1

Y1 - 2015/3/1

N2 - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

AB - In this paper, we present an effective way of combining character-based (N-gram) and word-based approaches for Chinese text classification. Uni-gram and bi-gram features are considered as the baseline model, which are then combined with word features of length greater than or equal to 3. A weight coefficient that can be used to give higher weights to word features is also introduced. We further employ a serial approach based on feature transformation and dimension reduction techniques. The results of McNemar's test indicate that the performance is significantly improved by our proposed method.

UR - http://www.scopus.com/inward/record.url?scp=84922722190&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922722190&partnerID=8YFLogxK

U2 - 10.1002/tee.22049

DO - 10.1002/tee.22049

M3 - Article

AN - SCOPUS:84922722190

VL - 10

SP - 166

EP - 174

JO - IEEJ Transactions on Electrical and Electronic Engineering

JF - IEEJ Transactions on Electrical and Electronic Engineering

SN - 1931-4973

IS - 2

ER -