Proposal of seam degree and content similarity for web page segmentation

Jun Zeng, Brendan Flanagan, Qingyu Xiong, Junhao Wen, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this paper, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

Original languageEnglish
Title of host publicationProceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013
Pages9-14
Number of pages6
DOIs
Publication statusPublished - Dec 16 2013
Event2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013 - Matsue, Japan
Duration: Aug 31 2013Sep 4 2013

Publication series

NameProceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013

Other

Other2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013
CountryJapan
CityMatsue
Period8/31/139/4/13

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint Dive into the research topics of 'Proposal of seam degree and content similarity for web page segmentation'. Together they form a unique fingerprint.

  • Cite this

    Zeng, J., Flanagan, B., Xiong, Q., Wen, J., & Hirokawa, S. (2013). Proposal of seam degree and content similarity for web page segmentation. In Proceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013 (pp. 9-14). [6630309] (Proceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013). https://doi.org/10.1109/IIAI-AAI.2013.56