An experiment of automatic classification of books using Nippon Decimal Classification

研究成果: Contribution to journalArticle査読

1 被引用数 (Scopus)

抄録

In information retrieval, texts are usually retrieved by them with queries. In this study, an approach was suggested that texts are automatically classified into categories and retrieved by matching them with queries classified in the same way. For an efficient information retrieval using automatic classification, extracting methods of words from texts and matching methods are essential. Some extracting methods from Japanese texts have been suggested in natural languages processing. However, it is difficult to extract significant words from Japanese texts because Japanese texts are written without blank space separating words. As for matching methods, many weighting methods have been suggested as well as vector space models and probabilistic models. This article reports the results of an experiment of classifying Japanese texts into Nippon Decimal Classification (NDC) categories based on the title information in Japanese MARC records. In this experiment, three extracting methods: - juman, MHSA, n-gram - are tested on a set of 1,000 books. Four weighting methods: - relative term frequency between categories, tf · idf and tf (max)·idf - are tested. The results indicate that the extracting method using juman achieved best and the best weighting method was the relative term frequency between categories, being able to select correct classification categories (upper three digits of NDC) for about 55.9% of 1,000 books.

本文言語英語
ページ(範囲)31-45
ページ数15
ジャーナルLibrary and Information Science
39
出版ステータス出版済み - 12 1 1998
外部発表はい

All Science Journal Classification (ASJC) codes

  • 図書館情報学

フィンガープリント

「An experiment of automatic classification of books using Nippon Decimal Classification」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル