A comparative study on outlier removal from a large-scale dataset using unsupervised anomaly detection

Markus Goldstein, Seiichi Uchida

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

3 被引用数 (Scopus)

抄録

Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents.

本文言語英語
ホスト出版物のタイトルICPRAM 2016 - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods
編集者Maria De Marsico, Gabriella Sanniti di Baja, Ana Fred
出版社SciTePress
ページ263-269
ページ数7
ISBN(電子版)9789897581731
DOI
出版ステータス出版済み - 2016
イベント5th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2016 - Rome, イタリア
継続期間: 2月 24 20162月 26 2016

出版物シリーズ

名前ICPRAM 2016 - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods

その他

その他5th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2016
国/地域イタリア
CityRome
Period2/24/162/26/16

!!!All Science Journal Classification (ASJC) codes

  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「A comparative study on outlier removal from a large-scale dataset using unsupervised anomaly detection」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル