An information theoretic approach to detection of minority subsets in database

Shin Ando, Einoshin Suzuki

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

13 被引用数 (Scopus)

抄録

Detection of rare and exceptional occurrences in largescale databases have become an important practice in the field of knowledge discovery and information retrieval. Many databases include large amount of noise or irrelevant data, whose distribution often overlaps with the subsets of exceptional data containing useful knowledge. This paper addresses the problem of finding a small subset of "minority" data whose distribution overlaps with, but are exceptional to or inconsistent with that of the majority of the database. In such a case, conventional distance-based or density-based approaches in Outlier Detection are ineffective due to their dependence on the structure of the majority or the prerequisite of critical parameters. We formalize the task as an estimation of a model of the minority subset which provides a simple description of the subset and yet maintains divergence from that of the majority. This estimation is formalized as a minimization problem using an information theoretic framework of Rate Distortion theory. We further introduce conditions of the majority to derive an objective function which factorizes the property of the minority and dependence to the structure of the majority. The proposed method shows improvements from conventional approaches in artificial data and a promising result in document retrieval problem.

本文言語英語
ホスト出版物のタイトルProceedings - Sixth International Conference on Data Mining, ICDM 2006
ページ11-20
ページ数10
DOI
出版ステータス出版済み - 12 1 2006
イベント6th International Conference on Data Mining, ICDM 2006 - Hong Kong, 中国
継続期間: 12 18 200612 22 2006

出版物シリーズ

名前Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN(印刷版)1550-4786

その他

その他6th International Conference on Data Mining, ICDM 2006
国/地域中国
CityHong Kong
Period12/18/0612/22/06

All Science Journal Classification (ASJC) codes

  • 工学(全般)

フィンガープリント

「An information theoretic approach to detection of minority subsets in database」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル