TY - GEN

T1 - Toward drawing an atlas of hypothesis classes

T2 - 5th International Conference on Discovery Science, DS 2002

AU - Maruyama, Osamu

AU - Shoudai, Takayoshi

AU - Miyano, Satoru

PY - 2002

Y1 - 2002

N2 - Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C1 to another class C2. The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 ≤ ε ≤ 1, we say that C1 is (1−ε)-approximated to C2 if, for every distribution D over {0, 1}m, and for each hypothesis h1 ∈ C1, there exists a hypothesis h2 ∈ C2 such that, with the probability at most ε, we have h1(x) ≠ h2(x) where x ∈ {0, 1}m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C1 to C2 as an index of how similar C1 is to C2. We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.

AB - Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C1 to another class C2. The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 ≤ ε ≤ 1, we say that C1 is (1−ε)-approximated to C2 if, for every distribution D over {0, 1}m, and for each hypothesis h1 ∈ C1, there exists a hypothesis h2 ∈ C2 such that, with the probability at most ε, we have h1(x) ≠ h2(x) where x ∈ {0, 1}m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C1 to C2 as an index of how similar C1 is to C2. We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.

UR - http://www.scopus.com/inward/record.url?scp=84949755076&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949755076&partnerID=8YFLogxK

U2 - 10.1007/3-540-36182-0_20

DO - 10.1007/3-540-36182-0_20

M3 - Conference contribution

AN - SCOPUS:84949755076

SN - 3540001883

SN - 9783540001881

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 220

EP - 232

BT - Discovery Science - 5th International Conference, DS 2002, Proceedings

A2 - Lange, Steffen

A2 - Satoh, Ken

A2 - Smith, Carl H.

PB - Springer Verlag

Y2 - 24 November 2002 through 26 November 2002

ER -