### Abstract

A tree contraction pattern (TC-pattern) is an unordered tree-structured pattern which can express a tree-structure common to given unordered trees. A TC-pattern has some special vertices, called contractible vertex, into which every uncommon connected substructure is merged by edge contractions. In this paper, we propose a probabilistic method for computing a binary classification problem on tree-structured data. Given a positive set P and a negative set N of unordered trees with vertex labels on a finite alphabet, the problem is to find meaningful and optimal TC-patterns that classify P and N with high statistical measures. We formalize this problem as a multiple optimization problem, and propose a probabilistic method for computing it by employing enumeration algorithms for TC-patterns and Markov chain Monte Carlo method. In addition, as a theoretical aspect of this problem, we show the hardness of approximability of it. Finally, we show the experimental results of our method on glycan structure data.

Original language | English |
---|---|

Title of host publication | Proceedings of the 7th IADIS International Conference Information Systems 2014, IS 2014 |

Publisher | IADIS |

Pages | 95-102 |

Number of pages | 8 |

ISBN (Electronic) | 9789898704047 |

Publication status | Published - Jan 1 2014 |

Event | 7th IADIS International Conference on Information Systems, IS 2014 - Madrid, Spain Duration: Feb 28 2014 → Mar 2 2014 |

### Other

Other | 7th IADIS International Conference on Information Systems, IS 2014 |
---|---|

Country | Spain |

City | Madrid |

Period | 2/28/14 → 3/2/14 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Hardware and Architecture
- Information Systems
- Software
- Computer Science Applications

### Cite this

*Proceedings of the 7th IADIS International Conference Information Systems 2014, IS 2014*(pp. 95-102). IADIS.

**Discovery of tree structured patterns using Markov chain Monte Carlo method.** / Okamoto, Yasuhiro; Koyanagi, Kensuke; Shoudai, Takayoshi; Maruyama, Osamu.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the 7th IADIS International Conference Information Systems 2014, IS 2014.*IADIS, pp. 95-102, 7th IADIS International Conference on Information Systems, IS 2014, Madrid, Spain, 2/28/14.

}

TY - GEN

T1 - Discovery of tree structured patterns using Markov chain Monte Carlo method

AU - Okamoto, Yasuhiro

AU - Koyanagi, Kensuke

AU - Shoudai, Takayoshi

AU - Maruyama, Osamu

PY - 2014/1/1

Y1 - 2014/1/1

N2 - A tree contraction pattern (TC-pattern) is an unordered tree-structured pattern which can express a tree-structure common to given unordered trees. A TC-pattern has some special vertices, called contractible vertex, into which every uncommon connected substructure is merged by edge contractions. In this paper, we propose a probabilistic method for computing a binary classification problem on tree-structured data. Given a positive set P and a negative set N of unordered trees with vertex labels on a finite alphabet, the problem is to find meaningful and optimal TC-patterns that classify P and N with high statistical measures. We formalize this problem as a multiple optimization problem, and propose a probabilistic method for computing it by employing enumeration algorithms for TC-patterns and Markov chain Monte Carlo method. In addition, as a theoretical aspect of this problem, we show the hardness of approximability of it. Finally, we show the experimental results of our method on glycan structure data.

AB - A tree contraction pattern (TC-pattern) is an unordered tree-structured pattern which can express a tree-structure common to given unordered trees. A TC-pattern has some special vertices, called contractible vertex, into which every uncommon connected substructure is merged by edge contractions. In this paper, we propose a probabilistic method for computing a binary classification problem on tree-structured data. Given a positive set P and a negative set N of unordered trees with vertex labels on a finite alphabet, the problem is to find meaningful and optimal TC-patterns that classify P and N with high statistical measures. We formalize this problem as a multiple optimization problem, and propose a probabilistic method for computing it by employing enumeration algorithms for TC-patterns and Markov chain Monte Carlo method. In addition, as a theoretical aspect of this problem, we show the hardness of approximability of it. Finally, we show the experimental results of our method on glycan structure data.

UR - http://www.scopus.com/inward/record.url?scp=84944051162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944051162&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84944051162

SP - 95

EP - 102

BT - Proceedings of the 7th IADIS International Conference Information Systems 2014, IS 2014

PB - IADIS

ER -