Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP

Dai Yoshimura, Rei Kajitani, Yasuhiro Gotoh, Katsuyuki Katahira, Miki Okuno, Yoshitoshi Ogura, Tetsuya Hayashi, Takehiko Itoh

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.

Original languageEnglish
JournalMicrobial Genomics
Volume5
Issue number5
DOIs
Publication statusPublished - May 1 2019

Fingerprint

Single Nucleotide Polymorphism
Genome
Benchmarking
Disease Outbreaks
Bacteria

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Microbiology
  • Molecular Biology
  • Genetics

Cite this

Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline : BactSNP. / Yoshimura, Dai; Kajitani, Rei; Gotoh, Yasuhiro; Katahira, Katsuyuki; Okuno, Miki; Ogura, Yoshitoshi; Hayashi, Tetsuya; Itoh, Takehiko.

In: Microbial Genomics, Vol. 5, No. 5, 01.05.2019.

Research output: Contribution to journalArticle

@article{39226c934d6d4bc2957551ee678b435c,
title = "Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline: BactSNP",
abstract = "Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.",
author = "Dai Yoshimura and Rei Kajitani and Yasuhiro Gotoh and Katsuyuki Katahira and Miki Okuno and Yoshitoshi Ogura and Tetsuya Hayashi and Takehiko Itoh",
year = "2019",
month = "5",
day = "1",
doi = "10.1099/mgen.0.000261",
language = "English",
volume = "5",
journal = "Microbial genomics",
issn = "2057-5858",
publisher = "Microbiology Society",
number = "5",

}

TY - JOUR

T1 - Evaluation of SNP calling methods for closely related bacterial isolates and a novel high-accuracy pipeline

T2 - BactSNP

AU - Yoshimura, Dai

AU - Kajitani, Rei

AU - Gotoh, Yasuhiro

AU - Katahira, Katsuyuki

AU - Okuno, Miki

AU - Ogura, Yoshitoshi

AU - Hayashi, Tetsuya

AU - Itoh, Takehiko

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.

AB - Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.

UR - http://www.scopus.com/inward/record.url?scp=85067268424&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067268424&partnerID=8YFLogxK

U2 - 10.1099/mgen.0.000261

DO - 10.1099/mgen.0.000261

M3 - Article

C2 - 31099741

AN - SCOPUS:85067268424

VL - 5

JO - Microbial genomics

JF - Microbial genomics

SN - 2057-5858

IS - 5

ER -