A multi-label convolutional neural network for automatic image annotation

Alexis Vallet, Hiroyasu Sakamoto

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1% precision and 66.5% accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.

Original languageEnglish
Pages (from-to)767-775
Number of pages9
JournalJournal of information processing
Volume23
Issue number6
DOIs
Publication statusPublished - Nov 15 2015

Fingerprint

Labels
Neural networks
Animation
Volatile organic compounds
Cost functions
Computer vision
Availability
Costs

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

A multi-label convolutional neural network for automatic image annotation. / Vallet, Alexis; Sakamoto, Hiroyasu.

In: Journal of information processing, Vol. 23, No. 6, 15.11.2015, p. 767-775.

Research output: Contribution to journalArticle

Vallet, Alexis ; Sakamoto, Hiroyasu. / A multi-label convolutional neural network for automatic image annotation. In: Journal of information processing. 2015 ; Vol. 23, No. 6. pp. 767-775.
@article{ae6d597852b6460894299d9706854f87,
title = "A multi-label convolutional neural network for automatic image annotation",
abstract = "Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1{\%} precision and 66.5{\%} accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.",
author = "Alexis Vallet and Hiroyasu Sakamoto",
year = "2015",
month = "11",
day = "15",
doi = "10.2197/ipsjjip.23.767",
language = "English",
volume = "23",
pages = "767--775",
journal = "Journal of Information Processing",
issn = "0387-6101",
publisher = "Information Processing Society of Japan",
number = "6",

}

TY - JOUR

T1 - A multi-label convolutional neural network for automatic image annotation

AU - Vallet, Alexis

AU - Sakamoto, Hiroyasu

PY - 2015/11/15

Y1 - 2015/11/15

N2 - Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1% precision and 66.5% accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.

AB - Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1% precision and 66.5% accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.

UR - http://www.scopus.com/inward/record.url?scp=84947245532&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947245532&partnerID=8YFLogxK

U2 - 10.2197/ipsjjip.23.767

DO - 10.2197/ipsjjip.23.767

M3 - Article

AN - SCOPUS:84947245532

VL - 23

SP - 767

EP - 775

JO - Journal of Information Processing

JF - Journal of Information Processing

SN - 0387-6101

IS - 6

ER -