Extensive feature detection of N-terminal protein sorting signals

Hideo Bannai, Yoshinori Tamada, Osamu Maruyama, Kenta Nakai, Satoru Miyano

Research output: Contribution to journalArticle

537 Citations (Scopus)

Abstract

Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.

Original languageEnglish
Pages (from-to)298-305
Number of pages8
JournalBioinformatics
Volume18
Issue number2
DOIs
Publication statusPublished - Jan 1 2002

Fingerprint

Intuition
Feature Detection
Protein Sorting Signals
Sorting
Molecular Biology
Protein
Prediction
Predictors
Proteins
Neural Networks
Neural networks
Molecular biology
Attribute

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Extensive feature detection of N-terminal protein sorting signals. / Bannai, Hideo; Tamada, Yoshinori; Maruyama, Osamu; Nakai, Kenta; Miyano, Satoru.

In: Bioinformatics, Vol. 18, No. 2, 01.01.2002, p. 298-305.

Research output: Contribution to journalArticle

Bannai, Hideo ; Tamada, Yoshinori ; Maruyama, Osamu ; Nakai, Kenta ; Miyano, Satoru. / Extensive feature detection of N-terminal protein sorting signals. In: Bioinformatics. 2002 ; Vol. 18, No. 2. pp. 298-305.
@article{03384c6c7a694868adca485bfb58eca5,
title = "Extensive feature detection of N-terminal protein sorting signals",
abstract = "Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.",
author = "Hideo Bannai and Yoshinori Tamada and Osamu Maruyama and Kenta Nakai and Satoru Miyano",
year = "2002",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/18.2.298",
language = "English",
volume = "18",
pages = "298--305",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Extensive feature detection of N-terminal protein sorting signals

AU - Bannai, Hideo

AU - Tamada, Yoshinori

AU - Maruyama, Osamu

AU - Nakai, Kenta

AU - Miyano, Satoru

PY - 2002/1/1

Y1 - 2002/1/1

N2 - Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.

AB - Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.

UR - http://www.scopus.com/inward/record.url?scp=0036187913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036187913&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/18.2.298

DO - 10.1093/bioinformatics/18.2.298

M3 - Article

C2 - 11847077

AN - SCOPUS:0036187913

VL - 18

SP - 298

EP - 305

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 2

ER -