TY - GEN
T1 - Extraction of tag tree patterns with contractible variables from irregular semistructured data
AU - Miyahara, Tetsuhiro
AU - Suzuki, Yusuke
AU - Shoudai, Takayoshi
AU - Uchida, Tomoyuki
AU - Hirokawa, Sachio
AU - Takahashi, Kenichi
AU - Ueda, Hiroaki
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2003.
PY - 2003
Y1 - 2003
N2 - Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. Many semistructured data have irregularities such as missing or erroneous data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wildcard, and a variable can be substituted by an arbitrary tree. Especially, a contractible variable matches any subtree including a singleton vertex. So a tag tree pattern is suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistruc-tured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from irregular semistructured data.
AB - Information Extraction from semistructured data becomes more and more important. In order to extract meaningful or interesting contents from semistructured data, we need to extract common structured patterns from semistructured data. Many semistructured data have irregularities such as missing or erroneous data. A tag tree pattern is an edge labeled tree with ordered children which has tree structures of tags and structured variables. An edge label is a tag, a keyword or a wildcard, and a variable can be substituted by an arbitrary tree. Especially, a contractible variable matches any subtree including a singleton vertex. So a tag tree pattern is suited for representing common tree structured patterns in irregular semistructured data. We present a new method for extracting characteristic tag tree patterns from irregular semistruc-tured data by using an algorithm for finding a least generalized tag tree pattern explaining given data. We report some experiments of applying this method to extracting characteristic tag tree patterns from irregular semistructured data.
UR - http://www.scopus.com/inward/record.url?scp=7444232575&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=7444232575&partnerID=8YFLogxK
U2 - 10.1007/3-540-36175-8_43
DO - 10.1007/3-540-36175-8_43
M3 - Conference contribution
AN - SCOPUS:7444232575
T3 - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
SP - 430
EP - 436
BT - Advances in Knowledge Discovery and Data Mining
A2 - Wang, Kyu-Young
A2 - Jeon, Jongwoo
A2 - Shim, Kyuseok
A2 - Srivastava, Jaideep
PB - Springer Verlag
T2 - 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003
Y2 - 30 April 2003 through 2 May 2003
ER -