Convolutional Recurrent Neural Networks for Better Image Understanding

Alexis Vallet, Hiroyasu Sakamoto

研究成果: Chapter in Book/Report/Conference proceedingConference contribution

3 被引用数 (Scopus)

抄録

Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

本文言語英語
ホスト出版物のタイトル2016 International Conference on Digital Image Computing
ホスト出版物のサブタイトルTechniques and Applications, DICTA 2016
編集者Alan Wee-Chung Liew, Jun Zhou, Yongsheng Gao, Zhiyong Wang, Clinton Fookes, Brian Lovell, Michael Blumenstein
出版社Institute of Electrical and Electronics Engineers Inc.
ISBN(電子版)9781509028962
DOI
出版ステータス出版済み - 12 22 2016
イベント2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 - Gold Coast, オーストラリア
継続期間: 11 30 201612 2 2016

出版物シリーズ

名前2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016

その他

その他2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016
Countryオーストラリア
CityGold Coast
Period11/30/1612/2/16

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications

フィンガープリント 「Convolutional Recurrent Neural Networks for Better Image Understanding」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル