Finding all longest common segments in protein structures efficiently

Yen Kaow Ng, Linzhi Yin, Hirotaka Ono, Shuai Cheng Li

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a1;⋯;an) and B = (b1;⋯;bn) where ak, bk ∈ ℝ3, we are to find, among all the segments f = (ai;⋯; aj) and g = (bi;⋯;bj) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between f and g is to be within a given teR; (2) f and g can be superposed such that for each k,i ≤ k ≤ j, kak - bkk ≤t for a given t ∈ ℝ. We give an algorithm of O(n log n + nl) time complexity when the first requirement applies, where l is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any t ∈ ℝ, finds a segment of length at least 1, but of RMSD up to (1 + ∈)t, in O(n log n + n=∈) time. We propose an FPTAS which for any given ∈ ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k,i ≤k ≤ j, ||ak - bkk < (1 + ∈)t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(n log2 n/∈5) when consecutive points in A are separated by the same distance (which is the case with protein structures). These worst-case runtime complexities are verified using C++ implementations of the algorithms, which we have made available at http://alcs.sourceforge.net/.

Original languageEnglish
Pages (from-to)644-655
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume12
Issue number3
DOIs
Publication statusPublished - May 1 2015

Fingerprint

Protein Structure
Proteins
FPTAS
Time Complexity
Deviation
Roots
Requirements
C++
Consecutive
Alignment

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

Finding all longest common segments in protein structures efficiently. / Ng, Yen Kaow; Yin, Linzhi; Ono, Hirotaka; Li, Shuai Cheng.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, No. 3, 01.05.2015, p. 644-655.

Research output: Contribution to journalArticle

Ng, Yen Kaow ; Yin, Linzhi ; Ono, Hirotaka ; Li, Shuai Cheng. / Finding all longest common segments in protein structures efficiently. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015 ; Vol. 12, No. 3. pp. 644-655.
@article{fc5be0d5e9444af0b56bc174db3669ac,
title = "Finding all longest common segments in protein structures efficiently",
abstract = "The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a1;⋯;an) and B = (b1;⋯;bn) where ak, bk ∈ ℝ3, we are to find, among all the segments f = (ai;⋯; aj) and g = (bi;⋯;bj) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between f and g is to be within a given teR; (2) f and g can be superposed such that for each k,i ≤ k ≤ j, kak - bkk ≤t for a given t ∈ ℝ. We give an algorithm of O(n log n + nl) time complexity when the first requirement applies, where l is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any t ∈ ℝ, finds a segment of length at least 1, but of RMSD up to (1 + ∈)t, in O(n log n + n=∈) time. We propose an FPTAS which for any given ∈ ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k,i ≤k ≤ j, ||ak - bkk < (1 + ∈)t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(n log2 n/∈5) when consecutive points in A are separated by the same distance (which is the case with protein structures). These worst-case runtime complexities are verified using C++ implementations of the algorithms, which we have made available at http://alcs.sourceforge.net/.",
author = "Ng, {Yen Kaow} and Linzhi Yin and Hirotaka Ono and Li, {Shuai Cheng}",
year = "2015",
month = "5",
day = "1",
doi = "10.1109/TCBB.2014.2372782",
language = "English",
volume = "12",
pages = "644--655",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Finding all longest common segments in protein structures efficiently

AU - Ng, Yen Kaow

AU - Yin, Linzhi

AU - Ono, Hirotaka

AU - Li, Shuai Cheng

PY - 2015/5/1

Y1 - 2015/5/1

N2 - The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a1;⋯;an) and B = (b1;⋯;bn) where ak, bk ∈ ℝ3, we are to find, among all the segments f = (ai;⋯; aj) and g = (bi;⋯;bj) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between f and g is to be within a given teR; (2) f and g can be superposed such that for each k,i ≤ k ≤ j, kak - bkk ≤t for a given t ∈ ℝ. We give an algorithm of O(n log n + nl) time complexity when the first requirement applies, where l is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any t ∈ ℝ, finds a segment of length at least 1, but of RMSD up to (1 + ∈)t, in O(n log n + n=∈) time. We propose an FPTAS which for any given ∈ ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k,i ≤k ≤ j, ||ak - bkk < (1 + ∈)t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(n log2 n/∈5) when consecutive points in A are separated by the same distance (which is the case with protein structures). These worst-case runtime complexities are verified using C++ implementations of the algorithms, which we have made available at http://alcs.sourceforge.net/.

AB - The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures A = (a1;⋯;an) and B = (b1;⋯;bn) where ak, bk ∈ ℝ3, we are to find, among all the segments f = (ai;⋯; aj) and g = (bi;⋯;bj) that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between f and g is to be within a given teR; (2) f and g can be superposed such that for each k,i ≤ k ≤ j, kak - bkk ≤t for a given t ∈ ℝ. We give an algorithm of O(n log n + nl) time complexity when the first requirement applies, where l is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any t ∈ ℝ, finds a segment of length at least 1, but of RMSD up to (1 + ∈)t, in O(n log n + n=∈) time. We propose an FPTAS which for any given ∈ ∈ ℝ, finds all the segments f and g of the maximum length which can be superposed such that for each k,i ≤k ≤ j, ||ak - bkk < (1 + ∈)t, thus fulfilling the second requirement approximately. The algorithm has a time complexity of O(n log2 n/∈5) when consecutive points in A are separated by the same distance (which is the case with protein structures). These worst-case runtime complexities are verified using C++ implementations of the algorithms, which we have made available at http://alcs.sourceforge.net/.

UR - http://www.scopus.com/inward/record.url?scp=84940383810&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940383810&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2014.2372782

DO - 10.1109/TCBB.2014.2372782

M3 - Article

C2 - 26357275

AN - SCOPUS:84940383810

VL - 12

SP - 644

EP - 655

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 3

ER -