Fourth-Person Captioning: Describing Daily Events by Uni-supervised and Tri-regularized Training

Kazuto Nakashima, Yumi Iwashita, Akihiro Kawamura, Ryo Kurazume

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We aim to develop a supporting system which enhances the ability of human's short-term visual memory in an intelligent space where the human and a service robot coexist. Particularly, this paper focuses on how we can interpret and record diverse and complex life events on behalf of humans, from a multi-perspective viewpoint. We propose a novel method named 'fourth-person captioning', which generates natural language descriptions by summarizing visual contexts complementarily from three types of cameras corresponding the first-, second-, and third-person viewpoint. We first extend the latest image captioning technique and design a new model to generate a sequence of words given the multiple images. Then we provide an effective training strategy that needs only annotations supervising images from a single viewpoint in a general caption dataset and unsupervised triplet instances in the intelligent space. As the three types of cameras, we select a wearable camera on the human, a robot-mounted camera, and an embedded camera, which can be defined as the first-, second-, and third-person viewpoint, respectively. We hope our work will accelerate a cross-modal interaction bridging the human's egocentric cognition and multi-perspective intelligence.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2122-2127
Number of pages6
ISBN (Electronic)9781538666500
DOIs
Publication statusPublished - Jan 16 2019
Event2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018 - Miyazaki, Japan
Duration: Oct 7 2018Oct 10 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018

Conference

Conference2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
CountryJapan
CityMiyazaki
Period10/7/1810/10/18

Fingerprint

Cameras
Robots
Aptitude
Intelligence
Short-Term Memory
Cognition
Language
Data storage equipment
Robot

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management
  • Health Informatics
  • Artificial Intelligence
  • Computer Networks and Communications
  • Human-Computer Interaction

Cite this

Nakashima, K., Iwashita, Y., Kawamura, A., & Kurazume, R. (2019). Fourth-Person Captioning: Describing Daily Events by Uni-supervised and Tri-regularized Training. In Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018 (pp. 2122-2127). [8616361] (Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SMC.2018.00365

Fourth-Person Captioning : Describing Daily Events by Uni-supervised and Tri-regularized Training. / Nakashima, Kazuto; Iwashita, Yumi; Kawamura, Akihiro; Kurazume, Ryo.

Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018. Institute of Electrical and Electronics Engineers Inc., 2019. p. 2122-2127 8616361 (Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakashima, K, Iwashita, Y, Kawamura, A & Kurazume, R 2019, Fourth-Person Captioning: Describing Daily Events by Uni-supervised and Tri-regularized Training. in Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018., 8616361, Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018, Institute of Electrical and Electronics Engineers Inc., pp. 2122-2127, 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018, Miyazaki, Japan, 10/7/18. https://doi.org/10.1109/SMC.2018.00365
Nakashima K, Iwashita Y, Kawamura A, Kurazume R. Fourth-Person Captioning: Describing Daily Events by Uni-supervised and Tri-regularized Training. In Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 2122-2127. 8616361. (Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018). https://doi.org/10.1109/SMC.2018.00365
Nakashima, Kazuto ; Iwashita, Yumi ; Kawamura, Akihiro ; Kurazume, Ryo. / Fourth-Person Captioning : Describing Daily Events by Uni-supervised and Tri-regularized Training. Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 2122-2127 (Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018).
@inproceedings{8b8c74b14a5242b28baf5961151f3b67,
title = "Fourth-Person Captioning: Describing Daily Events by Uni-supervised and Tri-regularized Training",
abstract = "We aim to develop a supporting system which enhances the ability of human's short-term visual memory in an intelligent space where the human and a service robot coexist. Particularly, this paper focuses on how we can interpret and record diverse and complex life events on behalf of humans, from a multi-perspective viewpoint. We propose a novel method named 'fourth-person captioning', which generates natural language descriptions by summarizing visual contexts complementarily from three types of cameras corresponding the first-, second-, and third-person viewpoint. We first extend the latest image captioning technique and design a new model to generate a sequence of words given the multiple images. Then we provide an effective training strategy that needs only annotations supervising images from a single viewpoint in a general caption dataset and unsupervised triplet instances in the intelligent space. As the three types of cameras, we select a wearable camera on the human, a robot-mounted camera, and an embedded camera, which can be defined as the first-, second-, and third-person viewpoint, respectively. We hope our work will accelerate a cross-modal interaction bridging the human's egocentric cognition and multi-perspective intelligence.",
author = "Kazuto Nakashima and Yumi Iwashita and Akihiro Kawamura and Ryo Kurazume",
year = "2019",
month = "1",
day = "16",
doi = "10.1109/SMC.2018.00365",
language = "English",
series = "Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2122--2127",
booktitle = "Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018",
address = "United States",

}

TY - GEN

T1 - Fourth-Person Captioning

T2 - Describing Daily Events by Uni-supervised and Tri-regularized Training

AU - Nakashima, Kazuto

AU - Iwashita, Yumi

AU - Kawamura, Akihiro

AU - Kurazume, Ryo

PY - 2019/1/16

Y1 - 2019/1/16

N2 - We aim to develop a supporting system which enhances the ability of human's short-term visual memory in an intelligent space where the human and a service robot coexist. Particularly, this paper focuses on how we can interpret and record diverse and complex life events on behalf of humans, from a multi-perspective viewpoint. We propose a novel method named 'fourth-person captioning', which generates natural language descriptions by summarizing visual contexts complementarily from three types of cameras corresponding the first-, second-, and third-person viewpoint. We first extend the latest image captioning technique and design a new model to generate a sequence of words given the multiple images. Then we provide an effective training strategy that needs only annotations supervising images from a single viewpoint in a general caption dataset and unsupervised triplet instances in the intelligent space. As the three types of cameras, we select a wearable camera on the human, a robot-mounted camera, and an embedded camera, which can be defined as the first-, second-, and third-person viewpoint, respectively. We hope our work will accelerate a cross-modal interaction bridging the human's egocentric cognition and multi-perspective intelligence.

AB - We aim to develop a supporting system which enhances the ability of human's short-term visual memory in an intelligent space where the human and a service robot coexist. Particularly, this paper focuses on how we can interpret and record diverse and complex life events on behalf of humans, from a multi-perspective viewpoint. We propose a novel method named 'fourth-person captioning', which generates natural language descriptions by summarizing visual contexts complementarily from three types of cameras corresponding the first-, second-, and third-person viewpoint. We first extend the latest image captioning technique and design a new model to generate a sequence of words given the multiple images. Then we provide an effective training strategy that needs only annotations supervising images from a single viewpoint in a general caption dataset and unsupervised triplet instances in the intelligent space. As the three types of cameras, we select a wearable camera on the human, a robot-mounted camera, and an embedded camera, which can be defined as the first-, second-, and third-person viewpoint, respectively. We hope our work will accelerate a cross-modal interaction bridging the human's egocentric cognition and multi-perspective intelligence.

UR - http://www.scopus.com/inward/record.url?scp=85062231854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062231854&partnerID=8YFLogxK

U2 - 10.1109/SMC.2018.00365

DO - 10.1109/SMC.2018.00365

M3 - Conference contribution

AN - SCOPUS:85062231854

T3 - Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018

SP - 2122

EP - 2127

BT - Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -