A modality converting approach for image annotation to overcome the inconsistent labels in training data

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The automatic image annotation (AIA) task, in which a system specifies descriptive keywords for an input image, has been a shared task studied for long time, and still important because the annotation keywords enables users efficient access of ever-growing image data. However, the current performance of the AIA systems remains at low levels. One of the difficulties of the AIA comes from inconsistency of annotation keywords in the training data, which is naturally occurred in manual annotations, for many supervised methods. For example, annotation keywords for images of people may be “tourist” or “woman” depending on scenes of the images. This inconsistency makes it difficult to annotate images, which possibly have such similar keywords. For that difficulty, we propose a modality converting method that transforms an input image into an encyclopedic text of keywords assigned to the image. With the modality converting, similar keywords can share their features derived from texts with each other. In the proposed method, we pair images with Wikipedia articles, which have annotation keywords as their titles. We train a modality convertor from images to Wikipedia texts using a neural network with the paired data. Then, the method classifies the converted text into annotation keywords similar to the text classification. Experimental results show relatively high performance of our method based on the converted text compared with existing methods.

Original languageEnglish
Title of host publicationPattern Recognition - ICPR International Workshops and Challenges, Proceedings
EditorsAlberto Del Bimbo, Marco Bertini, Stan Sclaroff, Tao Mei, Hugo Jair Escalante, Rita Cucchiara, Roberto Vezzani, Giovanni Maria Farinella
PublisherSpringer Science and Business Media Deutschland GmbH
Pages261-268
Number of pages8
ISBN (Print)9783030687892
DOIs
Publication statusPublished - 2021
Event25th International Conference on Pattern Recognition Workshops, ICPR 2020 - Virtual, Online
Duration: Jan 10 2021Jan 15 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12662 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Pattern Recognition Workshops, ICPR 2020
CityVirtual, Online
Period1/10/211/15/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A modality converting approach for image annotation to overcome the inconsistent labels in training data'. Together they form a unique fingerprint.

Cite this