TY - JOUR
T1 - MetaPlatanus
T2 - a metagenome assembler that combines long-range sequence links and species-specific features
AU - Kajitani, Rei
AU - Noguchi, Hideki
AU - Gotoh, Yasuhiro
AU - Ogura, Yoshitoshi
AU - Yoshimura, Dai
AU - Okuno, Miki
AU - Toyoda, Atsushi
AU - Kuwahara, Tomomi
AU - Hayashi, Tetsuya
AU - Itoh, Takehiko
N1 - Funding Information:
JSPS KAKENHI [JP16H06279 (PAGS), JP15H05979, JP18K19286, JP19H03206, JP20K15769, JP21K19211] from the Ministry of Education, Culture, Sports, Science and Technology of Japan; AMED [JP16gm6010003] from the Japan Agency for Medical Research and Development.
Publisher Copyright:
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2021/12/16
Y1 - 2021/12/16
N2 - De novo metagenome assembly is effective in assembling multiple draft genomes, including those of uncultured organisms. However, heterogeneity in the metagenome hinders assembly and introduces interspecies misassembly deleterious for downstream analysis. For this purpose, we developed a hybrid metagenome assembler, MetaPlatanus. First, as a characteristic function, it assembles the basic contigs from accurate short reads and then iteratively utilizes long-range sequence links, species-specific sequence compositions, and coverage depth. The binning information was also used to improve contiguity. Benchmarking using mock datasets consisting of known bacteria with long reads or mate pairs revealed the high contiguity MetaPlatanus with a few interspecies misassemblies. For published human gut data with nanopore reads from potable sequencers, MetaPlatanus assembled many biologically important elements, such as coding genes, gene clusters, viral sequences, and over-half bacterial genomes. In the benchmark with published human saliva data with high-throughput nanopore reads, the superiority of MetaPlatanus was considerably more evident. We found that some high-abundance bacterial genomes were assembled only by MetaPlatanus as near-complete. Furthermore, MetaPlatanus can circumvent the limitations of highly fragmented assemblies and frequent interspecies misassembles obtained by the other tools. Overall, the study demonstrates that MetaPlatanus could be an effective approach for exploring large-scale structures in metagenomes.
AB - De novo metagenome assembly is effective in assembling multiple draft genomes, including those of uncultured organisms. However, heterogeneity in the metagenome hinders assembly and introduces interspecies misassembly deleterious for downstream analysis. For this purpose, we developed a hybrid metagenome assembler, MetaPlatanus. First, as a characteristic function, it assembles the basic contigs from accurate short reads and then iteratively utilizes long-range sequence links, species-specific sequence compositions, and coverage depth. The binning information was also used to improve contiguity. Benchmarking using mock datasets consisting of known bacteria with long reads or mate pairs revealed the high contiguity MetaPlatanus with a few interspecies misassemblies. For published human gut data with nanopore reads from potable sequencers, MetaPlatanus assembled many biologically important elements, such as coding genes, gene clusters, viral sequences, and over-half bacterial genomes. In the benchmark with published human saliva data with high-throughput nanopore reads, the superiority of MetaPlatanus was considerably more evident. We found that some high-abundance bacterial genomes were assembled only by MetaPlatanus as near-complete. Furthermore, MetaPlatanus can circumvent the limitations of highly fragmented assemblies and frequent interspecies misassembles obtained by the other tools. Overall, the study demonstrates that MetaPlatanus could be an effective approach for exploring large-scale structures in metagenomes.
UR - http://www.scopus.com/inward/record.url?scp=85122841243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122841243&partnerID=8YFLogxK
U2 - 10.1093/nar/gkab831
DO - 10.1093/nar/gkab831
M3 - Article
C2 - 34570223
AN - SCOPUS:85122841243
SN - 0305-1048
VL - 49
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 22
M1 - e130
ER -