GBoost: A mathematical programming approach to graph classification and regression

Hiroto Saigo, Sebastian Nowozin, Tadashi Kadowaki, Taku Kudo, Koji Tsuda

Research output: Contribution to journalArticle

78 Citations (Scopus)

Abstract

Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

Original languageEnglish
Pages (from-to)69-89
Number of pages21
JournalMachine Learning
Volume75
Issue number1
DOIs
Publication statusPublished - Apr 1 2009
Externally publishedYes

Fingerprint

Mathematical programming
Adaptive boosting
Learning systems
Labels

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Cite this

GBoost : A mathematical programming approach to graph classification and regression. / Saigo, Hiroto; Nowozin, Sebastian; Kadowaki, Tadashi; Kudo, Taku; Tsuda, Koji.

In: Machine Learning, Vol. 75, No. 1, 01.04.2009, p. 69-89.

Research output: Contribution to journalArticle

Saigo, Hiroto ; Nowozin, Sebastian ; Kadowaki, Tadashi ; Kudo, Taku ; Tsuda, Koji. / GBoost : A mathematical programming approach to graph classification and regression. In: Machine Learning. 2009 ; Vol. 75, No. 1. pp. 69-89.
@article{ccbc975271d2424e989901a82388ad49,
title = "GBoost: A mathematical programming approach to graph classification and regression",
abstract = "Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.",
author = "Hiroto Saigo and Sebastian Nowozin and Tadashi Kadowaki and Taku Kudo and Koji Tsuda",
year = "2009",
month = "4",
day = "1",
doi = "10.1007/s10994-008-5089-z",
language = "English",
volume = "75",
pages = "69--89",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - GBoost

T2 - A mathematical programming approach to graph classification and regression

AU - Saigo, Hiroto

AU - Nowozin, Sebastian

AU - Kadowaki, Tadashi

AU - Kudo, Taku

AU - Tsuda, Koji

PY - 2009/4/1

Y1 - 2009/4/1

N2 - Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

AB - Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

UR - http://www.scopus.com/inward/record.url?scp=60949105177&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60949105177&partnerID=8YFLogxK

U2 - 10.1007/s10994-008-5089-z

DO - 10.1007/s10994-008-5089-z

M3 - Article

AN - SCOPUS:60949105177

VL - 75

SP - 69

EP - 89

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1

ER -