Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques

Guowei Zu, Wataru Oyama, Tetsushi Wakabayashi, Fumitaka Kimura

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.

Original languageEnglish
Pages (from-to)852-860
Number of pages9
JournalIEEJ Transactions on Electronics, Information and Systems
Volume124
Issue number3
DOIs
Publication statusPublished - Jan 1 2004

Fingerprint

Principal component analysis
Pattern recognition
Experiments

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this

Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques. / Zu, Guowei; Oyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka.

In: IEEJ Transactions on Electronics, Information and Systems, Vol. 124, No. 3, 01.01.2004, p. 852-860.

Research output: Contribution to journalArticle

Zu, Guowei ; Oyama, Wataru ; Wakabayashi, Tetsushi ; Kimura, Fumitaka. / Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques. In: IEEJ Transactions on Electronics, Information and Systems. 2004 ; Vol. 124, No. 3. pp. 852-860.
@article{cb0b74329798431ea1e01dfec21cdf52,
title = "Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques",
abstract = "The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.",
author = "Guowei Zu and Wataru Oyama and Tetsushi Wakabayashi and Fumitaka Kimura",
year = "2004",
month = "1",
day = "1",
doi = "10.1541/ieejeiss.124.852",
language = "English",
volume = "124",
pages = "852--860",
journal = "IEEJ Transactions on Electronics, Information and Systems",
issn = "0385-4221",
publisher = "The Institute of Electrical Engineers of Japan",
number = "3",

}

TY - JOUR

T1 - Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques

AU - Zu, Guowei

AU - Oyama, Wataru

AU - Wakabayashi, Tetsushi

AU - Kimura, Fumitaka

PY - 2004/1/1

Y1 - 2004/1/1

N2 - The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.

AB - The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.

UR - http://www.scopus.com/inward/record.url?scp=33847744664&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847744664&partnerID=8YFLogxK

U2 - 10.1541/ieejeiss.124.852

DO - 10.1541/ieejeiss.124.852

M3 - Article

AN - SCOPUS:33847744664

VL - 124

SP - 852

EP - 860

JO - IEEJ Transactions on Electronics, Information and Systems

JF - IEEJ Transactions on Electronics, Information and Systems

SN - 0385-4221

IS - 3

ER -