A web page segmentation approach using seam degree and content similarity

Jun Zeng, Brendan Flanagan, Qingyu Xiong, Junhao Wen, Sachio Hirokawa

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this chapter, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

Original languageEnglish
Pages (from-to)91-103
Number of pages13
JournalStudies in Computational Intelligence
Volume553
DOIs
Publication statusPublished - Jan 1 2014

Fingerprint

Websites
Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Cite this

A web page segmentation approach using seam degree and content similarity. / Zeng, Jun; Flanagan, Brendan; Xiong, Qingyu; Wen, Junhao; Hirokawa, Sachio.

In: Studies in Computational Intelligence, Vol. 553, 01.01.2014, p. 91-103.

Research output: Contribution to journalArticle

Zeng, Jun ; Flanagan, Brendan ; Xiong, Qingyu ; Wen, Junhao ; Hirokawa, Sachio. / A web page segmentation approach using seam degree and content similarity. In: Studies in Computational Intelligence. 2014 ; Vol. 553. pp. 91-103.
@article{6309aa256d5841e0b8aa511b222c1e2f,
title = "A web page segmentation approach using seam degree and content similarity",
abstract = "Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this chapter, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.",
author = "Jun Zeng and Brendan Flanagan and Qingyu Xiong and Junhao Wen and Sachio Hirokawa",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-05717-0_7",
language = "English",
volume = "553",
pages = "91--103",
journal = "Studies in Computational Intelligence",
issn = "1860-949X",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - A web page segmentation approach using seam degree and content similarity

AU - Zeng, Jun

AU - Flanagan, Brendan

AU - Xiong, Qingyu

AU - Wen, Junhao

AU - Hirokawa, Sachio

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this chapter, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

AB - Page segmentation has received great attention in recent years. However, most research has been based on some pre-defined heuristics or visual cues which may be not suitable for large-scale page segmentation. In this chapter, we proposed two parameters: seam degree and content similarity, to indicate the coherent degree of a page block. Instead of analyzing pre-defined heuristics or visual cues, our method utilizes the visual and content features to determine whether a page block should be divided into smaller blocks. We also proposed a principled page segmentation method using these two parameters. An experiment was conducted to determine the relationship between the two parameters and the number of segment results. The empirical results also show that our segmentation method can effectively segment a page into different semantic parts.

UR - http://www.scopus.com/inward/record.url?scp=84926614053&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926614053&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05717-0_7

DO - 10.1007/978-3-319-05717-0_7

M3 - Article

AN - SCOPUS:84926614053

VL - 553

SP - 91

EP - 103

JO - Studies in Computational Intelligence

JF - Studies in Computational Intelligence

SN - 1860-949X

ER -