Proposal of seam degree and content similarity for web page segmentation

Jun Zeng, Brendan Flanagan, Qingyu Xiong, Junhao Wen, Sachio Hirokawa

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this paper, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

    Original languageEnglish
    Title of host publicationProceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013
    Pages9-14
    Number of pages6
    DOIs
    Publication statusPublished - 2013
    Event2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013 - Matsue, Japan
    Duration: Aug 31 2013Sep 4 2013

    Publication series

    NameProceedings - 2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013

    Other

    Other2nd IIAI International Conference on Advanced Applied Informatics, IIAI-AAI 2013
    CountryJapan
    CityMatsue
    Period8/31/139/4/13

    All Science Journal Classification (ASJC) codes

    • Information Systems

    Fingerprint Dive into the research topics of 'Proposal of seam degree and content similarity for web page segmentation'. Together they form a unique fingerprint.

    Cite this