String Kernels based on variable-length-don't-care patterns

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new string kernel based on variable-length-don't-care patterns (VLDC patterns). A VLDC pattern is an element of (∑{∈})*, where ∑ is an alphabet and is the variable-length-don't-care symbol that matches any string in ∑ *. The number of VLDC patterns matching a given string s of length n is O(22n ). We present an O(n 5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.

Original languageEnglish
Title of host publicationDiscovery Science - 11th International Conference, DS 2008, Proceedings
Pages308-318
Number of pages11
DOIs
Publication statusPublished - Dec 1 2008
Event11th International Conference on Discovery Science, DS 2008 - Budapest, Hungary
Duration: Oct 13 2008Oct 16 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5255 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Discovery Science, DS 2008
CountryHungary
CityBudapest
Period10/13/0810/16/08

Fingerprint

Pattern matching
Support vector machines
Strings
kernel
Spam
Pattern Matching
Support Vector Machine
Classify
Computing
Evaluate

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Narisawa, K., Bannai, H., Hatano, K., Inenaga, S., & Takeda, M. (2008). String Kernels based on variable-length-don't-care patterns. In Discovery Science - 11th International Conference, DS 2008, Proceedings (pp. 308-318). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5255 LNAI). https://doi.org/10.1007/978-3-540-88411-8-29

String Kernels based on variable-length-don't-care patterns. / Narisawa, Kazuyuki; Bannai, Hideo; Hatano, Kohei; Inenaga, Shunsuke; Takeda, Masayuki.

Discovery Science - 11th International Conference, DS 2008, Proceedings. 2008. p. 308-318 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5255 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Narisawa, K, Bannai, H, Hatano, K, Inenaga, S & Takeda, M 2008, String Kernels based on variable-length-don't-care patterns. in Discovery Science - 11th International Conference, DS 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5255 LNAI, pp. 308-318, 11th International Conference on Discovery Science, DS 2008, Budapest, Hungary, 10/13/08. https://doi.org/10.1007/978-3-540-88411-8-29
Narisawa K, Bannai H, Hatano K, Inenaga S, Takeda M. String Kernels based on variable-length-don't-care patterns. In Discovery Science - 11th International Conference, DS 2008, Proceedings. 2008. p. 308-318. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-88411-8-29
Narisawa, Kazuyuki ; Bannai, Hideo ; Hatano, Kohei ; Inenaga, Shunsuke ; Takeda, Masayuki. / String Kernels based on variable-length-don't-care patterns. Discovery Science - 11th International Conference, DS 2008, Proceedings. 2008. pp. 308-318 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{64345673e3184f31b18f3c581023c228,
title = "String Kernels based on variable-length-don't-care patterns",
abstract = "We propose a new string kernel based on variable-length-don't-care patterns (VLDC patterns). A VLDC pattern is an element of (∑{∈})*, where ∑ is an alphabet and is the variable-length-don't-care symbol that matches any string in ∑ *. The number of VLDC patterns matching a given string s of length n is O(22n ). We present an O(n 5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.",
author = "Kazuyuki Narisawa and Hideo Bannai and Kohei Hatano and Shunsuke Inenaga and Masayuki Takeda",
year = "2008",
month = "12",
day = "1",
doi = "10.1007/978-3-540-88411-8-29",
language = "English",
isbn = "3540884106",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "308--318",
booktitle = "Discovery Science - 11th International Conference, DS 2008, Proceedings",

}

TY - GEN

T1 - String Kernels based on variable-length-don't-care patterns

AU - Narisawa, Kazuyuki

AU - Bannai, Hideo

AU - Hatano, Kohei

AU - Inenaga, Shunsuke

AU - Takeda, Masayuki

PY - 2008/12/1

Y1 - 2008/12/1

N2 - We propose a new string kernel based on variable-length-don't-care patterns (VLDC patterns). A VLDC pattern is an element of (∑{∈})*, where ∑ is an alphabet and is the variable-length-don't-care symbol that matches any string in ∑ *. The number of VLDC patterns matching a given string s of length n is O(22n ). We present an O(n 5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.

AB - We propose a new string kernel based on variable-length-don't-care patterns (VLDC patterns). A VLDC pattern is an element of (∑{∈})*, where ∑ is an alphabet and is the variable-length-don't-care symbol that matches any string in ∑ *. The number of VLDC patterns matching a given string s of length n is O(22n ). We present an O(n 5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.

UR - http://www.scopus.com/inward/record.url?scp=56749183765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56749183765&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-88411-8-29

DO - 10.1007/978-3-540-88411-8-29

M3 - Conference contribution

AN - SCOPUS:56749183765

SN - 3540884106

SN - 9783540884101

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 308

EP - 318

BT - Discovery Science - 11th International Conference, DS 2008, Proceedings

ER -