Sparse substring pattern set discovery using linear programming boosting

Kazuaki Kashihara, Kohei Hatano, Hideo Bannai, Masayuki Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.

Original languageEnglish
Title of host publicationDiscovery Science - 13th International Conference, DS 2010, Proceedings
Pages132-143
Number of pages12
DOIs
Publication statusPublished - Dec 20 2010
Event13th International Conference on Discovery Science, DS 2010 - Canberra, ACT, Australia
Duration: Oct 6 2010Oct 8 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6332 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th International Conference on Discovery Science, DS 2010
CountryAustralia
CityCanberra, ACT
Period10/6/1010/8/10

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Sparse substring pattern set discovery using linear programming boosting'. Together they form a unique fingerprint.

Cite this