Convolutional Recurrent Neural Networks for Better Image Understanding

Alexis Vallet, Hiroyasu Sakamoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

Original languageEnglish
Title of host publication2016 International Conference on Digital Image Computing
Subtitle of host publicationTechniques and Applications, DICTA 2016
EditorsAlan Wee-Chung Liew, Jun Zhou, Yongsheng Gao, Zhiyong Wang, Clinton Fookes, Brian Lovell, Michael Blumenstein
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509028962
DOIs
Publication statusPublished - Dec 22 2016
Event2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 - Gold Coast, Australia
Duration: Nov 30 2016Dec 2 2016

Other

Other2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016
CountryAustralia
CityGold Coast
Period11/30/1612/2/16

Fingerprint

Image understanding
Recurrent neural networks
Computer vision
Labels
Neural networks

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications

Cite this

Vallet, A., & Sakamoto, H. (2016). Convolutional Recurrent Neural Networks for Better Image Understanding. In A. W-C. Liew, J. Zhou, Y. Gao, Z. Wang, C. Fookes, B. Lovell, & M. Blumenstein (Eds.), 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 [7797026] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DICTA.2016.7797026

Convolutional Recurrent Neural Networks for Better Image Understanding. / Vallet, Alexis; Sakamoto, Hiroyasu.

2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016. ed. / Alan Wee-Chung Liew; Jun Zhou; Yongsheng Gao; Zhiyong Wang; Clinton Fookes; Brian Lovell; Michael Blumenstein. Institute of Electrical and Electronics Engineers Inc., 2016. 7797026.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vallet, A & Sakamoto, H 2016, Convolutional Recurrent Neural Networks for Better Image Understanding. in AW-C Liew, J Zhou, Y Gao, Z Wang, C Fookes, B Lovell & M Blumenstein (eds), 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016., 7797026, Institute of Electrical and Electronics Engineers Inc., 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016, Gold Coast, Australia, 11/30/16. https://doi.org/10.1109/DICTA.2016.7797026
Vallet A, Sakamoto H. Convolutional Recurrent Neural Networks for Better Image Understanding. In Liew AW-C, Zhou J, Gao Y, Wang Z, Fookes C, Lovell B, Blumenstein M, editors, 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016. Institute of Electrical and Electronics Engineers Inc. 2016. 7797026 https://doi.org/10.1109/DICTA.2016.7797026
Vallet, Alexis ; Sakamoto, Hiroyasu. / Convolutional Recurrent Neural Networks for Better Image Understanding. 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016. editor / Alan Wee-Chung Liew ; Jun Zhou ; Yongsheng Gao ; Zhiyong Wang ; Clinton Fookes ; Brian Lovell ; Michael Blumenstein. Institute of Electrical and Electronics Engineers Inc., 2016.
@inproceedings{794904ae2a244d23bfd852309fc1b08b,
title = "Convolutional Recurrent Neural Networks for Better Image Understanding",
abstract = "Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.",
author = "Alexis Vallet and Hiroyasu Sakamoto",
year = "2016",
month = "12",
day = "22",
doi = "10.1109/DICTA.2016.7797026",
language = "English",
editor = "Liew, {Alan Wee-Chung} and Jun Zhou and Yongsheng Gao and Zhiyong Wang and Clinton Fookes and Brian Lovell and Michael Blumenstein",
booktitle = "2016 International Conference on Digital Image Computing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Convolutional Recurrent Neural Networks for Better Image Understanding

AU - Vallet, Alexis

AU - Sakamoto, Hiroyasu

PY - 2016/12/22

Y1 - 2016/12/22

N2 - Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

AB - Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

UR - http://www.scopus.com/inward/record.url?scp=85011018155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011018155&partnerID=8YFLogxK

U2 - 10.1109/DICTA.2016.7797026

DO - 10.1109/DICTA.2016.7797026

M3 - Conference contribution

AN - SCOPUS:85011018155

BT - 2016 International Conference on Digital Image Computing

A2 - Liew, Alan Wee-Chung

A2 - Zhou, Jun

A2 - Gao, Yongsheng

A2 - Wang, Zhiyong

A2 - Fookes, Clinton

A2 - Lovell, Brian

A2 - Blumenstein, Michael

PB - Institute of Electrical and Electronics Engineers Inc.

ER -