Convolutional Recurrent Neural Networks for Better Image Understanding

Alexis Vallet, Hiroyasu Sakamoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

Original languageEnglish
Title of host publication2016 International Conference on Digital Image Computing
Subtitle of host publicationTechniques and Applications, DICTA 2016
EditorsAlan Wee-Chung Liew, Jun Zhou, Yongsheng Gao, Zhiyong Wang, Clinton Fookes, Brian Lovell, Michael Blumenstein
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509028962
DOIs
Publication statusPublished - Dec 22 2016
Event2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 - Gold Coast, Australia
Duration: Nov 30 2016Dec 2 2016

Publication series

Name2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016

Other

Other2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016
CountryAustralia
CityGold Coast
Period11/30/1612/2/16

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Convolutional Recurrent Neural Networks for Better Image Understanding'. Together they form a unique fingerprint.

Cite this