A web page segmentation approach using visual semantics

Jun Zeng, Brendan Flanagan, Sachio Hirokawa, Eisuke Ito

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.

Original languageEnglish
Pages (from-to)223-230
Number of pages8
JournalIEICE Transactions on Information and Systems
VolumeE97-D
Issue number2
DOIs
Publication statusPublished - Jan 1 2014

Fingerprint

Websites
Semantics
World Wide Web
Learning algorithms
Learning systems
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

A web page segmentation approach using visual semantics. / Zeng, Jun; Flanagan, Brendan; Hirokawa, Sachio; Ito, Eisuke.

In: IEICE Transactions on Information and Systems, Vol. E97-D, No. 2, 01.01.2014, p. 223-230.

Research output: Contribution to journalArticle

@article{eb106e2a963545a89f794c193743b6de,
title = "A web page segmentation approach using visual semantics",
abstract = "Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.",
author = "Jun Zeng and Brendan Flanagan and Sachio Hirokawa and Eisuke Ito",
year = "2014",
month = "1",
day = "1",
doi = "10.1587/transinf.E97.D.223",
language = "English",
volume = "E97-D",
pages = "223--230",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "一般社団法人電子情報通信学会",
number = "2",

}

TY - JOUR

T1 - A web page segmentation approach using visual semantics

AU - Zeng, Jun

AU - Flanagan, Brendan

AU - Hirokawa, Sachio

AU - Ito, Eisuke

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.

AB - Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.

UR - http://www.scopus.com/inward/record.url?scp=84893333574&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893333574&partnerID=8YFLogxK

U2 - 10.1587/transinf.E97.D.223

DO - 10.1587/transinf.E97.D.223

M3 - Article

AN - SCOPUS:84893333574

VL - E97-D

SP - 223

EP - 230

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 2

ER -