A new family of string classifiers based on local relatedness

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common sub-sequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.

Original languageEnglish
Title of host publicationDiscovery Science - 9th International Conference, DS 2006, Proceedings
PublisherSpringer Verlag
Pages114-124
Number of pages11
Volume4265 LNAI
ISBN (Print)3540464913, 9783540464914
Publication statusPublished - 2006
Event9th International Conference on Discovery Science, DS 2006 - Barcelona, Spain
Duration: Oct 7 2006Oct 10 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4265 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th International Conference on Discovery Science, DS 2006
CountrySpain
CityBarcelona
Period10/7/0610/10/06

Fingerprint

Longest Common Subsequence
Classifiers
Strings
Classifier
String Matching
Pruning
Efficient Algorithms
NP-complete problem
Heuristics
Family

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Higa, Y., Inenaga, S., Bannai, H., & Takeda, M. (2006). A new family of string classifiers based on local relatedness. In Discovery Science - 9th International Conference, DS 2006, Proceedings (Vol. 4265 LNAI, pp. 114-124). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4265 LNAI). Springer Verlag.

A new family of string classifiers based on local relatedness. / Higa, Yasuto; Inenaga, Shunsuke; Bannai, Hideo; Takeda, Masayuki.

Discovery Science - 9th International Conference, DS 2006, Proceedings. Vol. 4265 LNAI Springer Verlag, 2006. p. 114-124 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4265 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Higa, Y, Inenaga, S, Bannai, H & Takeda, M 2006, A new family of string classifiers based on local relatedness. in Discovery Science - 9th International Conference, DS 2006, Proceedings. vol. 4265 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4265 LNAI, Springer Verlag, pp. 114-124, 9th International Conference on Discovery Science, DS 2006, Barcelona, Spain, 10/7/06.
Higa Y, Inenaga S, Bannai H, Takeda M. A new family of string classifiers based on local relatedness. In Discovery Science - 9th International Conference, DS 2006, Proceedings. Vol. 4265 LNAI. Springer Verlag. 2006. p. 114-124. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Higa, Yasuto ; Inenaga, Shunsuke ; Bannai, Hideo ; Takeda, Masayuki. / A new family of string classifiers based on local relatedness. Discovery Science - 9th International Conference, DS 2006, Proceedings. Vol. 4265 LNAI Springer Verlag, 2006. pp. 114-124 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e050465321824445b596bea7fef10e00,
title = "A new family of string classifiers based on local relatedness",
abstract = "This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common sub-sequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.",
author = "Yasuto Higa and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda",
year = "2006",
language = "English",
isbn = "3540464913",
volume = "4265 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "114--124",
booktitle = "Discovery Science - 9th International Conference, DS 2006, Proceedings",
address = "Germany",

}

TY - GEN

T1 - A new family of string classifiers based on local relatedness

AU - Higa, Yasuto

AU - Inenaga, Shunsuke

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2006

Y1 - 2006

N2 - This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common sub-sequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.

AB - This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common sub-sequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.

UR - http://www.scopus.com/inward/record.url?scp=33750744852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750744852&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33750744852

SN - 3540464913

SN - 9783540464914

VL - 4265 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 114

EP - 124

BT - Discovery Science - 9th International Conference, DS 2006, Proceedings

PB - Springer Verlag

ER -