TY - GEN

T1 - Sparse substring pattern set discovery using linear programming boosting

AU - Kashihara, Kazuaki

AU - Hatano, Kohei

AU - Bannai, Hideo

AU - Takeda, Masayuki

PY - 2010/12/20

Y1 - 2010/12/20

N2 - In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.

AB - In this paper, we consider finding a small set of substring patterns which classifies the given documents well. We formulate the problem as 1 norm soft margin optimization problem where each dimension corresponds to a substring pattern. Then we solve this problem by using LPBoost and an optimal substring discovery algorithm. Since the problem is a linear program, the resulting solution is likely to be sparse, which is useful for feature selection. We evaluate the proposed method for real data such as movie reviews.

UR - http://www.scopus.com/inward/record.url?scp=78650100633&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650100633&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-16184-1_10

DO - 10.1007/978-3-642-16184-1_10

M3 - Conference contribution

AN - SCOPUS:78650100633

SN - 3642161839

SN - 9783642161834

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 132

EP - 143

BT - Discovery Science - 13th International Conference, DS 2010, Proceedings

T2 - 13th International Conference on Discovery Science, DS 2010

Y2 - 6 October 2010 through 8 October 2010

ER -