A web page segmentation approach using seam degree and content similarity

Jun Zeng, Brendan Flanagan, Qingyu Xiong, Junhao Wen, Sachio Hirokawa

    Research output: Contribution to journalArticlepeer-review

    1 Citation (Scopus)

    Abstract

    Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this chapter, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

    Original languageEnglish
    Pages (from-to)91-103
    Number of pages13
    JournalStudies in Computational Intelligence
    Volume553
    DOIs
    Publication statusPublished - Jan 1 2014

    All Science Journal Classification (ASJC) codes

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'A web page segmentation approach using seam degree and content similarity'. Together they form a unique fingerprint.

    Cite this