Automatic chinese text classification using character-based and word-based approach

Xi Luo, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Contribution to journalConference article

1 Citation (Scopus)

Abstract

In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.

Original languageEnglish
Article number6628638
Pages (from-to)329-333
Number of pages5
JournalProceedings of the International Conference on Document Analysis and Recognition, ICDAR
DOIs
Publication statusPublished - Dec 11 2013
Externally publishedYes
Event12th International Conference on Document Analysis and Recognition, ICDAR 2013 - Washington, DC, United States
Duration: Aug 25 2013Aug 28 2013

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Cite this

Automatic chinese text classification using character-based and word-based approach. / Luo, Xi; Oyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka.

In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 11.12.2013, p. 329-333.

Research output: Contribution to journalConference article

@article{ae00a3bf7a684d08ab0d1d64570f6499,
title = "Automatic chinese text classification using character-based and word-based approach",
abstract = "In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.",
author = "Xi Luo and Wataru Oyama and Tetsushi Wakabayashi and Fumitaka Kimura",
year = "2013",
month = "12",
day = "11",
doi = "10.1109/ICDAR.2013.73",
language = "English",
pages = "329--333",
journal = "Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",
issn = "1520-5363",

}

TY - JOUR

T1 - Automatic chinese text classification using character-based and word-based approach

AU - Luo, Xi

AU - Oyama, Wataru

AU - Wakabayashi, Tetsushi

AU - Kimura, Fumitaka

PY - 2013/12/11

Y1 - 2013/12/11

N2 - In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.

AB - In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.

UR - http://www.scopus.com/inward/record.url?scp=84889582081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889582081&partnerID=8YFLogxK

U2 - 10.1109/ICDAR.2013.73

DO - 10.1109/ICDAR.2013.73

M3 - Conference article

AN - SCOPUS:84889582081

SP - 329

EP - 333

JO - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

JF - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

SN - 1520-5363

M1 - 6628638

ER -