Worst case and a distribution-based case analyses of sampling for rule discovery based on generality and accuracy

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we propose two sampling theories of rule discovery based on generality and accuracy. The first theory concerns the worst case: it extends a preliminary version of PAC learning, which represents a worst-case analysis for classification. In our analysis, a rule is defined as a probabilistic constraint of true assignment to the class attribute for corresponding examples, and we mainly analyze the case in which we try to avoid finding a bad rule. Effectiveness of our approach is demonstrated through examples for conjunction-rule discovery. The second theory concerns a distribution-based case: it represents the conditions that a rule exceeds pre-specified thresholds for generality and accuracy with high reliability. The idea is to assume a 2-dimensional normal distribution for two probabilistic variables, and obtain the conditions based on their confidence region. This approach has been validated experimentally using 21 benchmark data sets in the machine learning community against conventional methods each of which evaluates the reliability of generality. Discussions on related work are provided for PAC learning, multiple comparison, and analysis of association-rule discovery.

Original languageEnglish
Pages (from-to)29-36
Number of pages8
JournalApplied Intelligence
Volume22
Issue number1
DOIs
Publication statusPublished - Jan 1 2005
Externally publishedYes

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Cite this