TY - JOUR
T1 - ALPHLARD
T2 - A Bayesian method for analyzing HLA genes from whole genome sequence data
AU - Hayashi, Shuto
AU - Yamaguchi, Rui
AU - Mizuno, Shinichi
AU - Komura, Mitsuhiro
AU - Miyano, Satoru
AU - Nakagawa, Hidewaki
AU - Imoto, Seiya
N1 - Funding Information:
This work was supported by Japan Society for the Promotion of Science (15H02775 and 15H05912).
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Background: Although human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to identify the sequences of unknown HLA types not registered in HLA databases. Results: We developed a Bayesian model, called ALPHLARD, that collects reads potentially generated from HLA genes and accurately determines a pair of HLA types for each of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes at 3rd field resolution. Furthermore, ALPHLARD can detect rare germline variants not stored in HLA databases and call somatic mutations from paired normal and tumor sequence data. We illustrate the capability of ALPHLARD using 253 WES data and 25 WGS data from Illumina platforms. By comparing the results of HLA genotyping from SBT and amplicon sequencing methods, ALPHLARD achieved 98.8% for WES data and 98.5% for WGS data at 2nd field resolution. We also detected three somatic point mutations and one case of loss of heterozygosity in the HLA genes from the WGS data. Conclusions: ALPHLARD showed good performance for HLA genotyping even from low-coverage data. It also has a potential to detect rare germline variants and somatic mutations in HLA genes. It would help to fill in the current gaps in HLA reference databases and unveil the immunological significance of somatic mutations identified in HLA genes.
AB - Background: Although human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to identify the sequences of unknown HLA types not registered in HLA databases. Results: We developed a Bayesian model, called ALPHLARD, that collects reads potentially generated from HLA genes and accurately determines a pair of HLA types for each of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes at 3rd field resolution. Furthermore, ALPHLARD can detect rare germline variants not stored in HLA databases and call somatic mutations from paired normal and tumor sequence data. We illustrate the capability of ALPHLARD using 253 WES data and 25 WGS data from Illumina platforms. By comparing the results of HLA genotyping from SBT and amplicon sequencing methods, ALPHLARD achieved 98.8% for WES data and 98.5% for WGS data at 2nd field resolution. We also detected three somatic point mutations and one case of loss of heterozygosity in the HLA genes from the WGS data. Conclusions: ALPHLARD showed good performance for HLA genotyping even from low-coverage data. It also has a potential to detect rare germline variants and somatic mutations in HLA genes. It would help to fill in the current gaps in HLA reference databases and unveil the immunological significance of somatic mutations identified in HLA genes.
UR - http://www.scopus.com/inward/record.url?scp=85055913555&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055913555&partnerID=8YFLogxK
U2 - 10.1186/s12864-018-5169-9
DO - 10.1186/s12864-018-5169-9
M3 - Article
C2 - 30384854
AN - SCOPUS:85055913555
SN - 1471-2164
VL - 19
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 790
ER -