Extraction of tag tree patterns with contractible variables from irregular semistructured data

Tetsuhiro Miyahara, Yusuke Suzuki, Takayoshi Shoudai, Tomoyuki Uchida, Sachio Hirokawa, Kenichi Takahashi, Hiroaki Ueda

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Citations (Scopus)

    Abstract

    Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. Many semistructured data have irregularities such as missing or erroneous data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wildcard, and a variable can be substituted by an arbitrary tree. Especially, a contractible variable matches any subtree including a singleton vertex. So a tag tree pattern is suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistruc-tured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from irregular semistructured data.

    Original languageEnglish
    Title of host publicationAdvances in Knowledge Discovery and Data Mining
    EditorsKyu-Young Wang, Jongwoo Jeon, Kyuseok Shim, Jaideep Srivastava
    PublisherSpringer Verlag
    Pages430-436
    Number of pages7
    ISBN (Electronic)3540047603, 9783540047605
    DOIs
    Publication statusPublished - 2003
    Event7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 - Seoul, Korea, Republic of
    Duration: Apr 30 2003May 2 2003

    Publication series

    NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
    Volume2637
    ISSN (Print)0302-9743

    Other

    Other7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003
    CountryKorea, Republic of
    CitySeoul
    Period4/30/035/2/03

    All Science Journal Classification (ASJC) codes

    • Theoretical Computer Science
    • Computer Science(all)

    Fingerprint Dive into the research topics of 'Extraction of tag tree patterns with contractible variables from irregular semistructured data'. Together they form a unique fingerprint.

    Cite this