Discovering outlier filtering rules from unlabeled data

Kenji Yamanishi, Junnichi Takeuchi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

69 Citations (Scopus)

Abstract

This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

Original languageEnglish
Title of host publicationProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsF. Provost, R. Srikant, M. Schkolnick, D. Lee
Pages389-394
Number of pages6
Publication statusPublished - Dec 1 2001
EventProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - San Francisco, CA, United States
Duration: Aug 26 2001Aug 29 2001

Publication series

NameProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

OtherProceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
CountryUnited States
CitySan Francisco, CA
Period8/26/018/29/01

Fingerprint

Unsupervised learning
Supervised learning
Labels
Intrusion detection

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Yamanishi, K., & Takeuchi, J. (2001). Discovering outlier filtering rules from unlabeled data. In F. Provost, R. Srikant, M. Schkolnick, & D. Lee (Eds.), Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 389-394). (Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Discovering outlier filtering rules from unlabeled data. / Yamanishi, Kenji; Takeuchi, Junnichi.

Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / F. Provost; R. Srikant; M. Schkolnick; D. Lee. 2001. p. 389-394 (Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamanishi, K & Takeuchi, J 2001, Discovering outlier filtering rules from unlabeled data. in F Provost, R Srikant, M Schkolnick & D Lee (eds), Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389-394, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), San Francisco, CA, United States, 8/26/01.
Yamanishi K, Takeuchi J. Discovering outlier filtering rules from unlabeled data. In Provost F, Srikant R, Schkolnick M, Lee D, editors, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001. p. 389-394. (Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
Yamanishi, Kenji ; Takeuchi, Junnichi. / Discovering outlier filtering rules from unlabeled data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / F. Provost ; R. Srikant ; M. Schkolnick ; D. Lee. 2001. pp. 389-394 (Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{3afc9bd96257480cad3fb574d3047cf1,
title = "Discovering outlier filtering rules from unlabeled data",
abstract = "This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.",
author = "Kenji Yamanishi and Junnichi Takeuchi",
year = "2001",
month = "12",
day = "1",
language = "English",
isbn = "158113391X",
series = "Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
pages = "389--394",
editor = "F. Provost and R. Srikant and M. Schkolnick and D. Lee",
booktitle = "Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Discovering outlier filtering rules from unlabeled data

AU - Yamanishi, Kenji

AU - Takeuchi, Junnichi

PY - 2001/12/1

Y1 - 2001/12/1

N2 - This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

AB - This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.

UR - http://www.scopus.com/inward/record.url?scp=0035788911&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035788911&partnerID=8YFLogxK

M3 - Conference contribution

SN - 158113391X

T3 - Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 389

EP - 394

BT - Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Provost, F.

A2 - Srikant, R.

A2 - Schkolnick, M.

A2 - Lee, D.

ER -