Today, the volume of available data on the WWW becomes very huge, and searching information from the WWW is a difficult task for a novice user even if he/she uses the standard search engines. One solution to the problem is to build a user-specific search engine, the database of which includes a large number of web documents required for a user. In this paper, we present a method of building a crawler aiming to search the subset of the WWW related to on-topic pages. We show an effective strategy for leading the crawler to on-topic pages by using naive Bayes text classifier trained by an evaluation of pages gathered by the crawler.
|Number of pages||5|
|Journal||Research Reports on Information Science and Electrical Engineering of Kyushu University|
|Publication status||Published - Mar 2004|
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Hardware and Architecture
- Engineering (miscellaneous)