TY - GEN
T1 - Measuring over-generalization in the minimal multiple generalizations of biosequences
AU - Ng, Yen Kaow
AU - Ono, Hirotaka
AU - Shinohara, Takeshi
PY - 2005
Y1 - 2005
N2 - We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].
AB - We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].
UR - http://www.scopus.com/inward/record.url?scp=33745326763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745326763&partnerID=8YFLogxK
U2 - 10.1007/11563983_16
DO - 10.1007/11563983_16
M3 - Conference contribution
AN - SCOPUS:33745326763
SN - 3540292306
SN - 9783540292302
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 176
EP - 188
BT - Discovery Science - 8th International Conference, DS 2005, Proceedings
PB - Springer Verlag
T2 - 8th International Conference on Discovery Science, DS 2005
Y2 - 8 October 2005 through 11 October 2005
ER -