Weighted likelihood policy search with model selection

Tsuyoshi Ueno, Kohei Hayashi, Takashi Washio, Yoshinobu Kawahara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs). Although they have brought much progress in prac- tical applications of RL, there still remains an unsolved problem in DPS related to model selection for the policy. In this paper, we propose a novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation. WLPS naturally connects DPS to the statistical inference problem and thus various sophisticated techniques in statis- tics can be applied to DPS problems directly. Hence, by following the idea of the information criterion, we develop a new measurement for model comparison in DPS based on the weighted log-likelihood.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 25
Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Pages2357-2365
Number of pages9
Publication statusPublished - Dec 1 2012
Externally publishedYes
Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
Duration: Dec 3 2012Dec 6 2012

Publication series

NameAdvances in Neural Information Processing Systems
Volume3
ISSN (Print)1049-5258

Conference

Conference26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
CountryUnited States
CityLake Tahoe, NV
Period12/3/1212/6/12

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Ueno, T., Hayashi, K., Washio, T., & Kawahara, Y. (2012). Weighted likelihood policy search with model selection. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 (pp. 2357-2365). (Advances in Neural Information Processing Systems; Vol. 3).