String Kernels based on variable-length-don't-care patterns

Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Shunsuke Inenaga, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new string kernel based on variable-length-don't-care patterns (VLDC patterns). A VLDC pattern is an element of (∑{∈})*, where ∑ is an alphabet and is the variable-length-don't-care symbol that matches any string in ∑ *. The number of VLDC patterns matching a given string s of length n is O(22n ). We present an O(n 5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.

Original languageEnglish
Title of host publicationDiscovery Science - 11th International Conference, DS 2008, Proceedings
Pages308-318
Number of pages11
DOIs
Publication statusPublished - Dec 1 2008
Event11th International Conference on Discovery Science, DS 2008 - Budapest, Hungary
Duration: Oct 13 2008Oct 16 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5255 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Discovery Science, DS 2008
Country/TerritoryHungary
CityBudapest
Period10/13/0810/16/08

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'String Kernels based on variable-length-don't-care patterns'. Together they form a unique fingerprint.

Cite this