With plenty of sequences, comparative genomics is becoming important. Its basic approach is to find similar subsequences from the sequences of different species and then examine differences in detail among found similar parts. Instead of focusing on similar parts, this paper is devoted to find different parts directly from the whole DNA sequences. It is challenging because the large size prohibits computationally expensive methods and there exits so many differences in case of genome-wide comparison. To cope with this, we exploit the algorithm in (Ikeda and Suzuki, 2009), which finds unexpected, infrequent patterns. But, found patterns was not evaluated from the viewpoint of biology. In this paper, we show that patterns discovered by the algorithm from bacterial genome sequences match well biological features, such as RNA and transposon. Therefore, assuming these features as relevant regions, we compute F-measure values and show that some species achieves about 90%, which is one order of magnitude better than patterns found by an existing method. Thus, we conclude that the algorithm can find these infrequent, but biologically meaningful patterns from genome-wide sequences.
|ホスト出版物のタイトル||BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms|
|出版ステータス||出版済み - 5 27 2013|
|イベント||International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013 - Barcelona, スペイン|
継続期間: 2 11 2013 → 2 14 2013
|その他||International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013|
|Period||2/11/13 → 2/14/13|
All Science Journal Classification (ASJC) codes