Toward Automatic Identification of Dataset Names in Scholarly Articles

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

抄録

As the number of scholarly articles we can access is increasing, it becomes possible to read them freely. However, it is difficult to understand scholarly articles since they are basically written for experts. Our big goal is, developing methods to extract essential elements of articles, to facilitate open innovation. To this end, this paper is devoted to considering automatic identification of dataset names in articles. Because a dictionary of datasets is necessary for evaluation, existing methods have focused on some specific discipline. To achieve applicability to any disciplines, we adopt a machine learning approach with a huge amount of scholarly papers. Because we treat papers in multi-disciplines, it is challenging how to evaluate experimental results. To solve it, we quantitatively evaluate experimental results with precision@N, which does not require to know all the dataset names in the papers we use, and qualitatively check if candidate tokens are dataset names or not using a GUI tool we have developed. While about 1/3 tokens of the top 20 output by our method were dataset names, the other ones are names of methods, models, or organizations. So it is important future work to remove such noise results, using additive compositionality of word vectors.

本文言語英語
ホスト出版物のタイトルProceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019
出版社Institute of Electrical and Electronics Engineers Inc.
ページ379-382
ページ数4
ISBN(電子版)9781728126272
DOI
出版ステータス出版済み - 7月 2019
イベント8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019 - Toyama, 日本
継続期間: 7月 7 20197月 11 2019

出版物シリーズ

名前Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019

会議

会議8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019
国/地域日本
CityToyama
Period7/7/197/11/19

!!!All Science Journal Classification (ASJC) codes

  • コンピュータ ネットワークおよび通信
  • コンピュータ サイエンスの応用
  • 情報システム
  • 情報システムおよび情報管理
  • 社会科学(その他)

フィンガープリント

「Toward Automatic Identification of Dataset Names in Scholarly Articles」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル