TD algorithm for the variance of return and mean-variance reinforcement learning

Makoto Sato, Hajime Kimura, Shibenobu Kobayashi

研究成果: ジャーナルへの寄稿学術誌査読

20 被引用数 (Scopus)

抄録

Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.

本文言語英語
ページ(範囲)353-362
ページ数10
ジャーナルTransactions of the Japanese Society for Artificial Intelligence
16
3
DOI
出版ステータス出版済み - 2001
外部発表はい

!!!All Science Journal Classification (ASJC) codes

  • ソフトウェア
  • 人工知能

フィンガープリント

「TD algorithm for the variance of return and mean-variance reinforcement learning」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル