Reinforcement Learning is a kind of machine learning. We know Profit Sharing, the Rational Policy Making algorithm (RPM), the Penalty Avoiding Rational Policy Making algorithm and PS-r* to guarantee the rationality in a typical class of the Partially Observable Markov Decision Processes. However they cannot treat continuous state spaces. In this paper, we present a solution to adapt them in continuous state spaces. We give RPM a mechanism to treat continuous state spaces in the environment that has the same type of a reward. We show the effectiveness of the proposed method in numerical examples.
|Number of pages||10|
|Journal||Transactions of the Japanese Society for Artificial Intelligence|
|Publication status||Published - Jan 1 2007|
All Science Journal Classification (ASJC) codes
- Artificial Intelligence