Toward Automatic Identification of Dataset Names in Scholarly Articles

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

抜粋

As the number of scholarly articles we can access is increasing, it becomes possible to read them freely. However, it is difficult to understand scholarly articles since they are basically written for experts. Our big goal is, developing methods to extract essential elements of articles, to facilitate open innovation. To this end, this paper is devoted to considering automatic identification of dataset names in articles. Because a dictionary of datasets is necessary for evaluation, existing methods have focused on some specific discipline. To achieve applicability to any disciplines, we adopt a machine learning approach with a huge amount of scholarly papers. Because we treat papers in multi-disciplines, it is challenging how to evaluate experimental results. To solve it, we quantitatively evaluate experimental results with precision@N, which does not require to know all the dataset names in the papers we use, and qualitatively check if candidate tokens are dataset names or not using a GUI tool we have developed. While about 1/3 tokens of the top 20 output by our method were dataset names, the other ones are names of methods, models, or organizations. So it is important future work to remove such noise results, using additive compositionality of word vectors.

元の言語英語
ホスト出版物のタイトルProceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019
出版者Institute of Electrical and Electronics Engineers Inc.
ページ379-382
ページ数4
ISBN(電子版)9781728126272
DOI
出版物ステータス出版済み - 7 2019
イベント8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019 - Toyama, 日本
継続期間: 7 7 20197 11 2019

出版物シリーズ

名前Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019

会議

会議8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019
日本
Toyama
期間7/7/197/11/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Social Sciences (miscellaneous)

フィンガープリント Toward Automatic Identification of Dataset Names in Scholarly Articles' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Ikeda, D., & Taniguchi, Y. (2019). Toward Automatic Identification of Dataset Names in Scholarly Articles. : Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019 (pp. 379-382). [8992699] (Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IIAI-AAI.2019.00083