Classification of imbalanced documents by feature selection

Yusuke Adachi, Naoya Onimura, Takanori Yamashita, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We previously worked on category classification problem of reuter's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to other standard indicators in case data is imbalanced. This study aimed to show the effectiveness of feature selection by SVM-score in machine learning with imbalanced data. For the reuter's data, F-measure was calculated in the classification experiment of all 13 categories. As a result, feature selection by SVM-score shows high f-measure and precision. In addition, we found feature words of negative example improve the classification performance.

Original languageEnglish
Title of host publicationProceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017
PublisherAssociation for Computing Machinery
Pages228-232
Number of pages5
VolumePart F130280
ISBN (Electronic)9781450352413
DOIs
Publication statusPublished - May 19 2017
Event2017 International Conference on Compute and Data Analysis, ICCDA 2017 - Lakeland, United States
Duration: May 19 2017May 23 2017

Other

Other2017 International Conference on Compute and Data Analysis, ICCDA 2017
CountryUnited States
CityLakeland
Period5/19/175/23/17

Fingerprint

Feature extraction
Learning systems
Experiments

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Adachi, Y., Onimura, N., Yamashita, T., & Hirokawa, S. (2017). Classification of imbalanced documents by feature selection. In Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017 (Vol. Part F130280, pp. 228-232). Association for Computing Machinery. https://doi.org/10.1145/3093241.3093246

Classification of imbalanced documents by feature selection. / Adachi, Yusuke; Onimura, Naoya; Yamashita, Takanori; Hirokawa, Sachio.

Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017. Vol. Part F130280 Association for Computing Machinery, 2017. p. 228-232.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Adachi, Y, Onimura, N, Yamashita, T & Hirokawa, S 2017, Classification of imbalanced documents by feature selection. in Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017. vol. Part F130280, Association for Computing Machinery, pp. 228-232, 2017 International Conference on Compute and Data Analysis, ICCDA 2017, Lakeland, United States, 5/19/17. https://doi.org/10.1145/3093241.3093246
Adachi Y, Onimura N, Yamashita T, Hirokawa S. Classification of imbalanced documents by feature selection. In Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017. Vol. Part F130280. Association for Computing Machinery. 2017. p. 228-232 https://doi.org/10.1145/3093241.3093246
Adachi, Yusuke ; Onimura, Naoya ; Yamashita, Takanori ; Hirokawa, Sachio. / Classification of imbalanced documents by feature selection. Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017. Vol. Part F130280 Association for Computing Machinery, 2017. pp. 228-232
@inproceedings{827c6a7cbe714c73832642a669a3a622,
title = "Classification of imbalanced documents by feature selection",
abstract = "We previously worked on category classification problem of reuter's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to other standard indicators in case data is imbalanced. This study aimed to show the effectiveness of feature selection by SVM-score in machine learning with imbalanced data. For the reuter's data, F-measure was calculated in the classification experiment of all 13 categories. As a result, feature selection by SVM-score shows high f-measure and precision. In addition, we found feature words of negative example improve the classification performance.",
author = "Yusuke Adachi and Naoya Onimura and Takanori Yamashita and Sachio Hirokawa",
year = "2017",
month = "5",
day = "19",
doi = "10.1145/3093241.3093246",
language = "English",
volume = "Part F130280",
pages = "228--232",
booktitle = "Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Classification of imbalanced documents by feature selection

AU - Adachi, Yusuke

AU - Onimura, Naoya

AU - Yamashita, Takanori

AU - Hirokawa, Sachio

PY - 2017/5/19

Y1 - 2017/5/19

N2 - We previously worked on category classification problem of reuter's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to other standard indicators in case data is imbalanced. This study aimed to show the effectiveness of feature selection by SVM-score in machine learning with imbalanced data. For the reuter's data, F-measure was calculated in the classification experiment of all 13 categories. As a result, feature selection by SVM-score shows high f-measure and precision. In addition, we found feature words of negative example improve the classification performance.

AB - We previously worked on category classification problem of reuter's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to other standard indicators in case data is imbalanced. This study aimed to show the effectiveness of feature selection by SVM-score in machine learning with imbalanced data. For the reuter's data, F-measure was calculated in the classification experiment of all 13 categories. As a result, feature selection by SVM-score shows high f-measure and precision. In addition, we found feature words of negative example improve the classification performance.

UR - http://www.scopus.com/inward/record.url?scp=85030120453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030120453&partnerID=8YFLogxK

U2 - 10.1145/3093241.3093246

DO - 10.1145/3093241.3093246

M3 - Conference contribution

AN - SCOPUS:85030120453

VL - Part F130280

SP - 228

EP - 232

BT - Proceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017

PB - Association for Computing Machinery

ER -