TY - GEN
T1 - Impact of word segmentation errors on automatic Chinese text classification
AU - Luo, Xi
AU - Ohyama, Wataru
AU - Wakabayashi, Tetsushi
AU - Kimura, Fumitaka
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.
AB - In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.
UR - http://www.scopus.com/inward/record.url?scp=84862099733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862099733&partnerID=8YFLogxK
U2 - 10.1109/DAS.2012.43
DO - 10.1109/DAS.2012.43
M3 - Conference contribution
AN - SCOPUS:84862099733
SN - 9780769546612
T3 - Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
SP - 271
EP - 275
BT - Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
T2 - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Y2 - 27 March 2012 through 29 March 2012
ER -