Measuring over-generalization in the minimal multiple generalizations of biosequences

Yen Kaow Ng, Hirotaka Ono, Takeshi Shinohara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].

Original languageEnglish
Title of host publicationDiscovery Science - 8th International Conference, DS 2005, Proceedings
Pages176-188
Number of pages13
Publication statusPublished - Dec 1 2005
Event8th International Conference on Discovery Science, DS 2005 - , Singapore
Duration: Oct 8 2005Oct 11 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3735 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Discovery Science, DS 2005
CountrySingapore
Period10/8/0510/11/05

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Measuring over-generalization in the minimal multiple generalizations of biosequences'. Together they form a unique fingerprint.

Cite this