TY - GEN
T1 - Toward drawing an atlas of hypothesis classes
T2 - 5th International Conference on Discovery Science, DS 2002
AU - Maruyama, Osamu
AU - Shoudai, Takayoshi
AU - Miyano, Satoru
PY - 2002
Y1 - 2002
N2 - Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C1 to another class C2. The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 ≤ ε ≤ 1, we say that C1 is (1−ε)-approximated to C2 if, for every distribution D over {0, 1}m, and for each hypothesis h1 ∈ C1, there exists a hypothesis h2 ∈ C2 such that, with the probability at most ε, we have h1(x) ≠ h2(x) where x ∈ {0, 1}m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C1 to C2 as an index of how similar C1 is to C2. We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.
AB - Computational knowledge discovery can be considered to be a complicated human activity concerned with searching for something new from data with computer systems. The optimization of the entire process of computational knowledge discovery is a big challenge in computer science. If we had an atlas of hypothesis classes which describes prior and basic knowledge on relative relationship between the hypothesis classes, it would be helpful in selecting hypothesis classes to be searched in discovery processes. In this paper, to give a foundation for an atlas of various classes of hypotheses, we have defined a measure of approximation of a hypothesis class C1 to another class C2. The hypotheses we consider here are restricted to m-ary Boolean functions. For 0 ≤ ε ≤ 1, we say that C1 is (1−ε)-approximated to C2 if, for every distribution D over {0, 1}m, and for each hypothesis h1 ∈ C1, there exists a hypothesis h2 ∈ C2 such that, with the probability at most ε, we have h1(x) ≠ h2(x) where x ∈ {0, 1}m is drawn randomly and independently according to D. Thus, we can use the approximation ratio of C1 to C2 as an index of how similar C1 is to C2. We discuss lower bounds of the approximation ratios among representative classes of hypotheses like decision lists, decision trees, linear discriminant functions and so on. This prior knowledge would come in useful when selecting hypothesis classes in the initial stage and the sequential stages involved in the entire discovery process.
UR - http://www.scopus.com/inward/record.url?scp=84949755076&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949755076&partnerID=8YFLogxK
U2 - 10.1007/3-540-36182-0_20
DO - 10.1007/3-540-36182-0_20
M3 - Conference contribution
AN - SCOPUS:84949755076
SN - 3540001883
SN - 9783540001881
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 220
EP - 232
BT - Discovery Science - 5th International Conference, DS 2002, Proceedings
A2 - Lange, Steffen
A2 - Satoh, Ken
A2 - Smith, Carl H.
PB - Springer Verlag
Y2 - 24 November 2002 through 26 November 2002
ER -