Impact of word segmentation errors on automatic Chinese text classification

Xi Luo, Wataru Ohyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.

Original languageEnglish
Title of host publicationProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Pages271-275
Number of pages5
DOIs
Publication statusPublished - 2012
Event10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, Australia
Duration: Mar 27 2012Mar 29 2012

Publication series

NameProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012

Other

Other10th IAPR International Workshop on Document Analysis Systems, DAS 2012
CountryAustralia
CityGold Coast, QLD
Period3/27/123/29/12

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering

Fingerprint Dive into the research topics of 'Impact of word segmentation errors on automatic Chinese text classification'. Together they form a unique fingerprint.

Cite this