Practical algorithms for pattern based linear regression

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of discovering the optimal pattern from a set of strings and associated numeric attribute values. The goodness of a pattern is measured by the correlation between the number of occurrences of the pattern in each string, and the numeric attribute value assigned to the string. We present two algorithms based on suffix trees, that can find the optimal substring pattern in O(Nn) and O(N 2) time, respectively, where n is the number of strings and N is their total length. We further present a general branch and bound strategy that can be used when considering more complex pattern classes. We also show that combining the O(N 2) algorithm and the branch and bound heuristic increases the efficiency of the algorithm considerably.

Original languageEnglish
Title of host publicationDiscovery Science - 8th International Conference, DS 2005, Proceedings
Pages44-56
Number of pages13
DOIs
Publication statusPublished - Dec 1 2005
Event8th International Conference on Discovery Science, DS 2005 - , Singapore
Duration: Oct 8 2005Oct 11 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3735 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Discovery Science, DS 2005
CountrySingapore
Period10/8/0510/11/05

Fingerprint

Linear regression
Strings
Branch-and-bound
Numerics
Attribute
Suffix Tree
Heuristics

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Bannai, H., Hatano, K., Inenaga, S., & Takeda, M. (2005). Practical algorithms for pattern based linear regression. In Discovery Science - 8th International Conference, DS 2005, Proceedings (pp. 44-56). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3735 LNAI). https://doi.org/10.1007/11563983_6

Practical algorithms for pattern based linear regression. / Bannai, Hideo; Hatano, Kohei; Inenaga, Shunsuke; Takeda, Masayuki.

Discovery Science - 8th International Conference, DS 2005, Proceedings. 2005. p. 44-56 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3735 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bannai, H, Hatano, K, Inenaga, S & Takeda, M 2005, Practical algorithms for pattern based linear regression. in Discovery Science - 8th International Conference, DS 2005, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3735 LNAI, pp. 44-56, 8th International Conference on Discovery Science, DS 2005, Singapore, 10/8/05. https://doi.org/10.1007/11563983_6
Bannai H, Hatano K, Inenaga S, Takeda M. Practical algorithms for pattern based linear regression. In Discovery Science - 8th International Conference, DS 2005, Proceedings. 2005. p. 44-56. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11563983_6
Bannai, Hideo ; Hatano, Kohei ; Inenaga, Shunsuke ; Takeda, Masayuki. / Practical algorithms for pattern based linear regression. Discovery Science - 8th International Conference, DS 2005, Proceedings. 2005. pp. 44-56 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{26831e1f2e5048269153d40ce104804b,
title = "Practical algorithms for pattern based linear regression",
abstract = "We consider the problem of discovering the optimal pattern from a set of strings and associated numeric attribute values. The goodness of a pattern is measured by the correlation between the number of occurrences of the pattern in each string, and the numeric attribute value assigned to the string. We present two algorithms based on suffix trees, that can find the optimal substring pattern in O(Nn) and O(N 2) time, respectively, where n is the number of strings and N is their total length. We further present a general branch and bound strategy that can be used when considering more complex pattern classes. We also show that combining the O(N 2) algorithm and the branch and bound heuristic increases the efficiency of the algorithm considerably.",
author = "Hideo Bannai and Kohei Hatano and Shunsuke Inenaga and Masayuki Takeda",
year = "2005",
month = "12",
day = "1",
doi = "10.1007/11563983_6",
language = "English",
isbn = "3540292306",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "44--56",
booktitle = "Discovery Science - 8th International Conference, DS 2005, Proceedings",

}

TY - GEN

T1 - Practical algorithms for pattern based linear regression

AU - Bannai, Hideo

AU - Hatano, Kohei

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2005/12/1

Y1 - 2005/12/1

N2 - We consider the problem of discovering the optimal pattern from a set of strings and associated numeric attribute values. The goodness of a pattern is measured by the correlation between the number of occurrences of the pattern in each string, and the numeric attribute value assigned to the string. We present two algorithms based on suffix trees, that can find the optimal substring pattern in O(Nn) and O(N 2) time, respectively, where n is the number of strings and N is their total length. We further present a general branch and bound strategy that can be used when considering more complex pattern classes. We also show that combining the O(N 2) algorithm and the branch and bound heuristic increases the efficiency of the algorithm considerably.

AB - We consider the problem of discovering the optimal pattern from a set of strings and associated numeric attribute values. The goodness of a pattern is measured by the correlation between the number of occurrences of the pattern in each string, and the numeric attribute value assigned to the string. We present two algorithms based on suffix trees, that can find the optimal substring pattern in O(Nn) and O(N 2) time, respectively, where n is the number of strings and N is their total length. We further present a general branch and bound strategy that can be used when considering more complex pattern classes. We also show that combining the O(N 2) algorithm and the branch and bound heuristic increases the efficiency of the algorithm considerably.

UR - http://www.scopus.com/inward/record.url?scp=33745322333&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745322333&partnerID=8YFLogxK

U2 - 10.1007/11563983_6

DO - 10.1007/11563983_6

M3 - Conference contribution

AN - SCOPUS:33745322333

SN - 3540292306

SN - 9783540292302

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 44

EP - 56

BT - Discovery Science - 8th International Conference, DS 2005, Proceedings

ER -