TY - GEN
T1 - Contrastive Goal Grouping for Policy Generalization in Goal-Conditioned Reinforcement Learning
AU - Zou, Qiming
AU - Suzuki, Einoshin
N1 - Funding Information:
This work was supported by China Scholarship Council (Grant No. 202008050300).
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - We propose Contrastive Goal Grouping (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.
AB - We propose Contrastive Goal Grouping (COGOAL), a self-supervised goal embedding algorithm for learning a well-structured latent goal space to simplify goal-conditioned reinforcement learning. Compared to conventional reconstruction-based methods such as variational autoencoder, our approach can benefit from previously learnt goals and achieve better generalizability. More specifically, we theoretically prove a sufficient condition for determining whether goals share similar optimal policies, and propose COGOAL that groups goals satisfying the condition in the latent space via contrastive learning. The learnt goal embeddings enable a fully-trained policy for a goal to reach new goals which are adjacent in the latent space. We conduct experiments on visual navigation and visual object search tasks. COGOAL significantly outperforms the baseline methods in terms of sample efficiency in the visual object search task, in which a previously learnt policy is adaptively transferred to reach new goals with fine-tuning.
UR - http://www.scopus.com/inward/record.url?scp=85121803416&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121803416&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-92185-9_20
DO - 10.1007/978-3-030-92185-9_20
M3 - Conference contribution
AN - SCOPUS:85121803416
SN - 9783030921842
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 240
EP - 253
BT - Neural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
A2 - Mantoro, Teddy
A2 - Lee, Minho
A2 - Ayu, Media Anugerah
A2 - Wong, Kok Wai
A2 - Hidayanto, Achmad Nizar
PB - Springer Science and Business Media Deutschland GmbH
T2 - 28th International Conference on Neural Information Processing, ICONIP 2021
Y2 - 8 December 2021 through 12 December 2021
ER -