Using WWW-distribution of words in detecting peculiar web pages

Masayuki Hirose, Einoshin Suzuki

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this paper, we propose TFIGF, a method which detects peculiar web pages using distribution of words in WWW given a set of keywords. Our TFIGF detects a set of index words which represent a WWW page by estimating their importance in the WWW page and their rareness in WWW. Experiments using both English and Japanese WWW pages clearly show superiority of our approach over a traditional method which employs a limited number of WWW pages in the estimation.

Original languageEnglish
Pages (from-to)355-362
Number of pages8
JournalLecture Notes in Computer Science
Volume3245
Publication statusPublished - 2004
Externally publishedYes

Fingerprint

World Wide Web
Websites
Experiment
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Using WWW-distribution of words in detecting peculiar web pages. / Hirose, Masayuki; Suzuki, Einoshin.

In: Lecture Notes in Computer Science, Vol. 3245, 2004, p. 355-362.

Research output: Contribution to journalArticle

@article{5fff2300c3914af8a29386a9446f73de,
title = "Using WWW-distribution of words in detecting peculiar web pages",
abstract = "In this paper, we propose TFIGF, a method which detects peculiar web pages using distribution of words in WWW given a set of keywords. Our TFIGF detects a set of index words which represent a WWW page by estimating their importance in the WWW page and their rareness in WWW. Experiments using both English and Japanese WWW pages clearly show superiority of our approach over a traditional method which employs a limited number of WWW pages in the estimation.",
author = "Masayuki Hirose and Einoshin Suzuki",
year = "2004",
language = "English",
volume = "3245",
pages = "355--362",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Using WWW-distribution of words in detecting peculiar web pages

AU - Hirose, Masayuki

AU - Suzuki, Einoshin

PY - 2004

Y1 - 2004

N2 - In this paper, we propose TFIGF, a method which detects peculiar web pages using distribution of words in WWW given a set of keywords. Our TFIGF detects a set of index words which represent a WWW page by estimating their importance in the WWW page and their rareness in WWW. Experiments using both English and Japanese WWW pages clearly show superiority of our approach over a traditional method which employs a limited number of WWW pages in the estimation.

AB - In this paper, we propose TFIGF, a method which detects peculiar web pages using distribution of words in WWW given a set of keywords. Our TFIGF detects a set of index words which represent a WWW page by estimating their importance in the WWW page and their rareness in WWW. Experiments using both English and Japanese WWW pages clearly show superiority of our approach over a traditional method which employs a limited number of WWW pages in the estimation.

UR - http://www.scopus.com/inward/record.url?scp=33751108639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751108639&partnerID=8YFLogxK

M3 - Article

VL - 3245

SP - 355

EP - 362

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -