Learning Curves for Automating Content Analysis: How Much Human Annotation is Needed?

Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, Yoichi Tomiura, Yasuhiro Takayama, An Shou Cheng

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

抄録

In this paper, we explore the potential for reducing human effort when coding text segments for use in content analysis. The key idea is to do some coding by hand, to use the results of that initial effort as training data, and then to code the remainder of the content automatically. The test collection includes 102 written prepared statements about Net neutrality from public hearings held by the U.S Congress and the U.S. Federal Communications Commission (FCC). Six categories used in this analysis: wealth, social order, justice, freedom, innovation and honor. A support vector machine (SVM) classifier and a Naïve Bayes (NB) classifier were trained on manually annotated sentences from between one and 51 documents and tested on a held out of set of 51 documents. The results show that the inflection point for a standard measure of classifier accuracy (F1) occurs early, reaching at least 85% of the best achievable result by the SVM classifier with only 30 training documents, and at least 88% of the best achievable result by NB classifier with only 30 training documents. With the exception of honor, the results show that the scale of machine classification would reasonably be scaled up to larger collections of similar documents without additional human annotation effort.

本文言語英語
ホスト出版物のタイトルProceedings - 2015 IIAI 4th International Congress on Advanced Applied Informatics, IIAI-AAI 2015
編集者Sachio Hirokawa, Kiyota Hashimoto, Tokuro Matsuo, Tsunenori Mine
出版社Institute of Electrical and Electronics Engineers Inc.
ページ171-176
ページ数6
ISBN(電子版)9781479999583
DOI
出版ステータス出版済み - 1 6 2016
イベント4th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2015 - Okayama, 日本
継続期間: 7 12 20157 16 2015

出版物シリーズ

名前Proceedings - 2015 IIAI 4th International Congress on Advanced Applied Informatics, IIAI-AAI 2015

その他

その他4th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2015
国/地域日本
CityOkayama
Period7/12/157/16/15

All Science Journal Classification (ASJC) codes

  • 情報システム
  • コンピュータ ネットワークおよび通信
  • コンピュータ サイエンスの応用
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Learning Curves for Automating Content Analysis: How Much Human Annotation is Needed?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル