Using WWW-distribution of words in detecting peculiar web pages

Masayuki Hirose, Einoshin Suzuki

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

In this paper, we propose TFIGF, a method which detects peculiar web pages using distribution of words in WWW given a set of keywords. Our TFIGF detects a set of index words which represent a WWW page by estimating their importance in the WWW page and their rareness in WWW. Experiments using both English and Japanese WWW pages clearly show superiority of our approach over a traditional method which employs a limited number of WWW pages in the estimation.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsEinoshin Suzuki, Setsuo Arikawa
PublisherSpringer Verlag
Pages355-362
Number of pages8
ISBN (Print)9783540233572
DOIs
Publication statusPublished - 2004
Externally publishedYes

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3245
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Using WWW-distribution of words in detecting peculiar web pages'. Together they form a unique fingerprint.

Cite this