Mining complex genotypic features for predicting HIV-1 drug resistance

Hiroto Saigo, Takeaki Uno, Koji Tsuda

Research output: Contribution to journalArticle

34 Citations (Scopus)

Abstract

Motivation: Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Results: Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.

Original languageEnglish
Pages (from-to)2455-2462
Number of pages8
JournalBioinformatics
Volume23
Issue number18
DOIs
Publication statusPublished - Sep 15 2007
Externally publishedYes

Fingerprint

Drug Resistance
Viruses
Linear regression
Virus
HIV-1
Mining
Mutation
Drug therapy
Nucleotides
Decision trees
Pharmaceutical Preparations
Feature extraction
Reverse Transcriptase Inhibitors
Neural networks
Linear Models
Drugs
Experiments
Power set
Decision Trees
Human

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Mining complex genotypic features for predicting HIV-1 drug resistance. / Saigo, Hiroto; Uno, Takeaki; Tsuda, Koji.

In: Bioinformatics, Vol. 23, No. 18, 15.09.2007, p. 2455-2462.

Research output: Contribution to journalArticle

Saigo, Hiroto ; Uno, Takeaki ; Tsuda, Koji. / Mining complex genotypic features for predicting HIV-1 drug resistance. In: Bioinformatics. 2007 ; Vol. 23, No. 18. pp. 2455-2462.
@article{91c9888f57214e1992b7402a5cf98126,
title = "Mining complex genotypic features for predicting HIV-1 drug resistance",
abstract = "Motivation: Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Results: Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.",
author = "Hiroto Saigo and Takeaki Uno and Koji Tsuda",
year = "2007",
month = "9",
day = "15",
doi = "10.1093/bioinformatics/btm353",
language = "English",
volume = "23",
pages = "2455--2462",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "18",

}

TY - JOUR

T1 - Mining complex genotypic features for predicting HIV-1 drug resistance

AU - Saigo, Hiroto

AU - Uno, Takeaki

AU - Tsuda, Koji

PY - 2007/9/15

Y1 - 2007/9/15

N2 - Motivation: Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Results: Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.

AB - Motivation: Human immunodeficiency virus type 1 (HIV-1) evolves in human body, and its exposure to a drug often causes mutations that enhance the resistance against the drug. To design an effective pharmacotherapy for an individual patient, it is important to accurately predict the drug resistance based on genotype data. Notably, the resistance is not just the simple sum of the effects of all mutations. Structural biological studies suggest that the association of mutations is crucial: even if mutations A or B alone do not affect the resistance, a significant change might happen when the two mutations occur together. Linear regression methods cannot take the associations into account, while decision tree methods can reveal only limited associations. Kernel methods and neural networks implicitly use all possible associations for prediction, but cannot select salient associations explicitly. Results: Our method, itemset boosting, performs linear regression in the complete space of power sets of mutations. It implements a forward feature selection procedure where, in each iteration, one mutation combination is found by an efficient branch-and-bound search. This method uses all possible combinations, and salient associations are explicitly shown. In experiments, our method worked particularly well for predicting the resistance of nucleotide reverse transcriptase inhibitors (NRTIs). Furthermore, it successfully recovered many mutation associations known in biological literature.

UR - http://www.scopus.com/inward/record.url?scp=34548720347&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548720347&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm353

DO - 10.1093/bioinformatics/btm353

M3 - Article

C2 - 17698858

AN - SCOPUS:34548720347

VL - 23

SP - 2455

EP - 2462

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 18

ER -