Layout-tree-based approach for identifying visually similar blocks in a web page

Jun Zeng, Brendan Flanagan, Sachio Hirokawa

    研究成果: Chapter in Book/Report/Conference proceedingConference contribution

    2 被引用数 (Scopus)

    抄録

    When extracting information from a web page, IE systems usually need to perform pattern recognition to identify the elements that have similar patterns. However, most of them are mainly based on analyzing HMTL source code, DOM tree, tag tree or Xpath of web pages. These methods are language-dependent, or more precisely, HTML-dependent. They have some insuperable limitations. In order to overcome these limitations, we propose a notion of layout-tree and a pattern recognition method to identify visual blocks with similar visual pattern using layout tree. In this paper, we call a visible rectangular region in a web page a visual block or block for short. We consider if the elements of two blocks are displayed in a similar layout, we define that the two blocks are visually similar. We first transform the layout into a layout tree. By calculating the similarity of the layout trees of two blocks, we can determine whether the two blocks are visually similar or not. The result of experiment shows that the layout tree is an effective method to identify visually similar blocks.

    本文言語英語
    ホスト出版物のタイトル2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings
    ページ65-70
    ページ数6
    DOI
    出版ステータス出版済み - 2013
    イベント2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Niigata, 日本
    継続期間: 6 16 20136 20 2013

    出版物シリーズ

    名前2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013 - Proceedings

    その他

    その他2013 IEEE/ACIS 12th International Conference on Computer and Information Science, ICIS 2013
    国/地域日本
    CityNiigata
    Period6/16/136/20/13

    All Science Journal Classification (ASJC) codes

    • コンピュータ サイエンス(その他)
    • 情報システム

    フィンガープリント

    「Layout-tree-based approach for identifying visually similar blocks in a web page」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

    引用スタイル