Automatic text classification of English newswire articles based on statistical classification techniques

Guowei Zu, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Contribution to journalArticlepeer-review

Abstract

The basic process of automatic text classification is learning a classification scheme from training examples and then using it to classify unseen textual documents. It is essentially the same as the process of graphic or character pattern recognition. Thus, the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques that include Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, and SVM, have been used for automatic text classification. Principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted using the Reuters-21578 test collection of English newswire articles. The results illustrate that the overall efficiency of modified projection distance is better than the other methods and that principal component analysis is suitable for reducing the dimensionality of the text features.

Original languageEnglish
Pages (from-to)50-60
Number of pages11
JournalElectrical Engineering in Japan (English translation of Denki Gakkai Ronbunshi)
Volume152
Issue number1
DOIs
Publication statusPublished - Jul 15 2005

All Science Journal Classification (ASJC) codes

  • Energy Engineering and Power Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Automatic text classification of English newswire articles based on statistical classification techniques'. Together they form a unique fingerprint.

Cite this