Contrastive Goal Grouping for Policy Generalization in Goal-Conditioned Reinforcement Learning

Qiming Zou, Einoshin Suzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose Contrastive Goal Grouping (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.

Original languageEnglish
Title of host publicationNeural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
EditorsTeddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto
PublisherSpringer Science and Business Media Deutschland GmbH
Pages240-253
Number of pages14
ISBN (Print)9783030921842
DOIs
Publication statusPublished - 2021
Event28th International Conference on Neural Information Processing, ICONIP 2021 - Virtual, Online
Duration: Dec 8 2021Dec 12 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13108 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Neural Information Processing, ICONIP 2021
CityVirtual, Online
Period12/8/2112/12/21

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Contrastive Goal Grouping for Policy Generalization in Goal-Conditioned Reinforcement Learning'. Together they form a unique fingerprint.

Cite this