Discovering outlier filtering rules from unlabeled data

Kenji Yamanishi, Jun Ichi Takeuchi

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

73 被引用数 (Scopus)

抄録

This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

本文言語英語
ホスト出版物のタイトルProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
編集者F. Provost, R. Srikant, M. Schkolnick, D. Lee
出版社Association for Computing Machinery (ACM)
ページ389-394
ページ数6
ISBN(印刷版)158113391X, 9781581133912
DOI
出版ステータス出版済み - 2001
外部発表はい
イベントProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - San Francisco, CA, 米国
継続期間: 8 26 20018 29 2001

出版物シリーズ

名前Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

その他

その他Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
Country米国
CitySan Francisco, CA
Period8/26/018/29/01

All Science Journal Classification (ASJC) codes

  • Engineering(all)

フィンガープリント 「Discovering outlier filtering rules from unlabeled data」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル