Learning multi-task local metrics for image annotation

Xing Xu, Atsushi Shimada, Hajime Nagahara, Rin-Ichiro Taniguchi

研究成果: ジャーナルへの寄稿記事

6 引用 (Scopus)

抄録

The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.

元の言語英語
ページ(範囲)2203-2231
ページ数29
ジャーナルMultimedia Tools and Applications
75
発行部数4
DOI
出版物ステータス出版済み - 2 1 2016

Fingerprint

Labels
Image classification
Learning algorithms

All Science Journal Classification (ASJC) codes

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

これを引用

Learning multi-task local metrics for image annotation. / Xu, Xing; Shimada, Atsushi; Nagahara, Hajime; Taniguchi, Rin-Ichiro.

:: Multimedia Tools and Applications, 巻 75, 番号 4, 01.02.2016, p. 2203-2231.

研究成果: ジャーナルへの寄稿記事

@article{3286a668bf4d40fdb0d7c674a77143f6,
title = "Learning multi-task local metrics for image annotation",
abstract = "The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.",
author = "Xing Xu and Atsushi Shimada and Hajime Nagahara and Rin-Ichiro Taniguchi",
year = "2016",
month = "2",
day = "1",
doi = "10.1007/s11042-014-2402-7",
language = "English",
volume = "75",
pages = "2203--2231",
journal = "Multimedia Tools and Applications",
issn = "1380-7501",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Learning multi-task local metrics for image annotation

AU - Xu, Xing

AU - Shimada, Atsushi

AU - Nagahara, Hajime

AU - Taniguchi, Rin-Ichiro

PY - 2016/2/1

Y1 - 2016/2/1

N2 - The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.

AB - The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.

UR - http://www.scopus.com/inward/record.url?scp=84959146193&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959146193&partnerID=8YFLogxK

U2 - 10.1007/s11042-014-2402-7

DO - 10.1007/s11042-014-2402-7

M3 - Article

VL - 75

SP - 2203

EP - 2231

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

SN - 1380-7501

IS - 4

ER -