TY - GEN
T1 - Weighted likelihood policy search with model selection
AU - Ueno, Tsuyoshi
AU - Hayashi, Kohei
AU - Washio, Takashi
AU - Kawahara, Yoshinobu
PY - 2012
Y1 - 2012
N2 - Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs). Although they have brought much progress in prac- tical applications of RL, there still remains an unsolved problem in DPS related to model selection for the policy. In this paper, we propose a novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation. WLPS naturally connects DPS to the statistical inference problem and thus various sophisticated techniques in statis- tics can be applied to DPS problems directly. Hence, by following the idea of the information criterion, we develop a new measurement for model comparison in DPS based on the weighted log-likelihood.
AB - Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs). Although they have brought much progress in prac- tical applications of RL, there still remains an unsolved problem in DPS related to model selection for the policy. In this paper, we propose a novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation. WLPS naturally connects DPS to the statistical inference problem and thus various sophisticated techniques in statis- tics can be applied to DPS problems directly. Hence, by following the idea of the information criterion, we develop a new measurement for model comparison in DPS based on the weighted log-likelihood.
UR - http://www.scopus.com/inward/record.url?scp=84877726367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877726367&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84877726367
SN - 9781627480031
T3 - Advances in Neural Information Processing Systems
SP - 2357
EP - 2365
BT - Advances in Neural Information Processing Systems 25
T2 - 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Y2 - 3 December 2012 through 6 December 2012
ER -