Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With plenty of sequences, comparative genomics is becoming important. Its basic approach is to find similar subsequences from the sequences of different species and then examine differences in detail among found similar parts. Instead of focusing on similar parts, this paper is devoted to find different parts directly from the whole DNA sequences. It is challenging because the large size prohibits computationally expensive methods and there exits so many differences in case of genome-wide comparison. To cope with this, we exploit the algorithm in (Ikeda and Suzuki, 2009), which finds unexpected, infrequent patterns. But, found patterns was not evaluated from the viewpoint of biology. In this paper, we show that patterns discovered by the algorithm from bacterial genome sequences match well biological features, such as RNA and transposon. Therefore, assuming these features as relevant regions, we compute F-measure values and show that some species achieves about 90%, which is one order of magnitude better than patterns found by an existing method. Thus, we conclude that the algorithm can find these infrequent, but biologically meaningful patterns from genome-wide sequences.

Original languageEnglish
Title of host publicationBIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms
Pages308-311
Number of pages4
Publication statusPublished - May 27 2013
EventInternational Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013 - Barcelona, Spain
Duration: Feb 11 2013Feb 14 2013

Other

OtherInternational Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013
CountrySpain
CityBarcelona
Period2/11/132/14/13

Fingerprint

Bacterial Genomes
Pattern Discovery
Comparative Analysis
Genome
Genes
DNA sequences
Genomics
RNA
Comparative Genomics
Subsequence
DNA Sequence
Biology

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Informatics
  • Modelling and Simulation

Cite this

Ikeda, D., Maruyama, O., & Kuhara, S. (2013). Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis. In BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (pp. 308-311)

Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis. / Ikeda, Daisuke; Maruyama, Osamu; Kuhara, Satoru.

BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms. 2013. p. 308-311.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ikeda, D, Maruyama, O & Kuhara, S 2013, Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis. in BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms. pp. 308-311, International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2013, Barcelona, Spain, 2/11/13.
Ikeda D, Maruyama O, Kuhara S. Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis. In BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms. 2013. p. 308-311
Ikeda, Daisuke ; Maruyama, Osamu ; Kuhara, Satoru. / Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis. BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms. 2013. pp. 308-311
@inproceedings{ad1f1b91e7034a1f92d07dec95174f55,
title = "Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis",
abstract = "With plenty of sequences, comparative genomics is becoming important. Its basic approach is to find similar subsequences from the sequences of different species and then examine differences in detail among found similar parts. Instead of focusing on similar parts, this paper is devoted to find different parts directly from the whole DNA sequences. It is challenging because the large size prohibits computationally expensive methods and there exits so many differences in case of genome-wide comparison. To cope with this, we exploit the algorithm in (Ikeda and Suzuki, 2009), which finds unexpected, infrequent patterns. But, found patterns was not evaluated from the viewpoint of biology. In this paper, we show that patterns discovered by the algorithm from bacterial genome sequences match well biological features, such as RNA and transposon. Therefore, assuming these features as relevant regions, we compute F-measure values and show that some species achieves about 90{\%}, which is one order of magnitude better than patterns found by an existing method. Thus, we conclude that the algorithm can find these infrequent, but biologically meaningful patterns from genome-wide sequences.",
author = "Daisuke Ikeda and Osamu Maruyama and Satoru Kuhara",
year = "2013",
month = "5",
day = "27",
language = "English",
isbn = "9789898565358",
pages = "308--311",
booktitle = "BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms",

}

TY - GEN

T1 - Infrequent, unexpected, and contrast pattern discovery from bacterial genomes by genome-wide comparative analysis

AU - Ikeda, Daisuke

AU - Maruyama, Osamu

AU - Kuhara, Satoru

PY - 2013/5/27

Y1 - 2013/5/27

N2 - With plenty of sequences, comparative genomics is becoming important. Its basic approach is to find similar subsequences from the sequences of different species and then examine differences in detail among found similar parts. Instead of focusing on similar parts, this paper is devoted to find different parts directly from the whole DNA sequences. It is challenging because the large size prohibits computationally expensive methods and there exits so many differences in case of genome-wide comparison. To cope with this, we exploit the algorithm in (Ikeda and Suzuki, 2009), which finds unexpected, infrequent patterns. But, found patterns was not evaluated from the viewpoint of biology. In this paper, we show that patterns discovered by the algorithm from bacterial genome sequences match well biological features, such as RNA and transposon. Therefore, assuming these features as relevant regions, we compute F-measure values and show that some species achieves about 90%, which is one order of magnitude better than patterns found by an existing method. Thus, we conclude that the algorithm can find these infrequent, but biologically meaningful patterns from genome-wide sequences.

AB - With plenty of sequences, comparative genomics is becoming important. Its basic approach is to find similar subsequences from the sequences of different species and then examine differences in detail among found similar parts. Instead of focusing on similar parts, this paper is devoted to find different parts directly from the whole DNA sequences. It is challenging because the large size prohibits computationally expensive methods and there exits so many differences in case of genome-wide comparison. To cope with this, we exploit the algorithm in (Ikeda and Suzuki, 2009), which finds unexpected, infrequent patterns. But, found patterns was not evaluated from the viewpoint of biology. In this paper, we show that patterns discovered by the algorithm from bacterial genome sequences match well biological features, such as RNA and transposon. Therefore, assuming these features as relevant regions, we compute F-measure values and show that some species achieves about 90%, which is one order of magnitude better than patterns found by an existing method. Thus, we conclude that the algorithm can find these infrequent, but biologically meaningful patterns from genome-wide sequences.

UR - http://www.scopus.com/inward/record.url?scp=84878002748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878002748&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84878002748

SN - 9789898565358

SP - 308

EP - 311

BT - BIOINFORMATICS 2013 - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms

ER -