Automatic identification of academic articles in Japanese PDF files

Teru Agata, Atsushi Ikeuchi, Emi Ishida, Michiko Nozue, Takashi Kuno, Shuichi Ueda

Research output: Contribution to journalArticle

3 Citations (Scopus)


As open-access policies gain acceptance, an increasing number of researchers are contributing their papers to publicly accessible web sites (i.e. self-archiving). Theoretically, these papers are accessible from standard search engines, but they tend to be obscured by other contents on the web. The purpose of this research is to develop a system that can automatically detect cademic articles and/or quasi-academic articles on the web. This paper describes experiments that were conducted on the performance of various classifiers and the results are compared in terms of precision, recall, and F-measure. The classifiers use attributes such as terms in PDF files and empirical rules. The results suggest the efficiency of a ranked output system which has several phases to identify academic articles.

Original languageEnglish
Pages (from-to)43-63
Number of pages21
JournalLibrary and Information Science
Issue number56
Publication statusPublished - 2006

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Automatic identification of academic articles in Japanese PDF files'. Together they form a unique fingerprint.

  • Cite this

    Agata, T., Ikeuchi, A., Ishida, E., Nozue, M., Kuno, T., & Ueda, S. (2006). Automatic identification of academic articles in Japanese PDF files. Library and Information Science, (56), 43-63.