Threshold probability of non-terminal type in finite horizon Markov decision processes

Akifumi Kira, Takayuki Ueno, Toshiharu Fujita

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

We consider a class of problems concerned with maximizing probabilities, given stage-wise targets, which generalizes the standard threshold probability problem in Markov decision processes. The objective function is the probability that, at all stages, the associatively combined accumulation of rewards earned up to that point takes its value in a specified stage-wise interval. It is shown that this class reduces to the case of the nonnegative-valued multiplicative criterion through an invariant imbedding technique. We derive a recursive formula for the optimal value function and an effective method for obtaining the optimal policies.

Original languageEnglish
Pages (from-to)461-472
Number of pages12
JournalJournal of Mathematical Analysis and Applications
Volume386
Issue number1
DOIs
Publication statusPublished - Feb 1 2012

Fingerprint

Finite Horizon
Markov Decision Process
Invariant Imbedding
Optimal Value Function
Recursive Formula
Optimal Policy
Reward
Multiplicative
Objective function
Non-negative
Generalise
Target
Interval
Class

All Science Journal Classification (ASJC) codes

  • Analysis
  • Applied Mathematics

Cite this

Threshold probability of non-terminal type in finite horizon Markov decision processes. / Kira, Akifumi; Ueno, Takayuki; Fujita, Toshiharu.

In: Journal of Mathematical Analysis and Applications, Vol. 386, No. 1, 01.02.2012, p. 461-472.

Research output: Contribution to journalArticle

Kira, Akifumi ; Ueno, Takayuki ; Fujita, Toshiharu. / Threshold probability of non-terminal type in finite horizon Markov decision processes. In: Journal of Mathematical Analysis and Applications. 2012 ; Vol. 386, No. 1. pp. 461-472.
@article{811fb0dc1c37465a9dc0770c18e399a5,
title = "Threshold probability of non-terminal type in finite horizon Markov decision processes",
abstract = "We consider a class of problems concerned with maximizing probabilities, given stage-wise targets, which generalizes the standard threshold probability problem in Markov decision processes. The objective function is the probability that, at all stages, the associatively combined accumulation of rewards earned up to that point takes its value in a specified stage-wise interval. It is shown that this class reduces to the case of the nonnegative-valued multiplicative criterion through an invariant imbedding technique. We derive a recursive formula for the optimal value function and an effective method for obtaining the optimal policies.",
author = "Akifumi Kira and Takayuki Ueno and Toshiharu Fujita",
year = "2012",
month = "2",
day = "1",
doi = "10.1016/j.jmaa.2011.08.006",
language = "English",
volume = "386",
pages = "461--472",
journal = "Journal of Mathematical Analysis and Applications",
issn = "0022-247X",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Threshold probability of non-terminal type in finite horizon Markov decision processes

AU - Kira, Akifumi

AU - Ueno, Takayuki

AU - Fujita, Toshiharu

PY - 2012/2/1

Y1 - 2012/2/1

N2 - We consider a class of problems concerned with maximizing probabilities, given stage-wise targets, which generalizes the standard threshold probability problem in Markov decision processes. The objective function is the probability that, at all stages, the associatively combined accumulation of rewards earned up to that point takes its value in a specified stage-wise interval. It is shown that this class reduces to the case of the nonnegative-valued multiplicative criterion through an invariant imbedding technique. We derive a recursive formula for the optimal value function and an effective method for obtaining the optimal policies.

AB - We consider a class of problems concerned with maximizing probabilities, given stage-wise targets, which generalizes the standard threshold probability problem in Markov decision processes. The objective function is the probability that, at all stages, the associatively combined accumulation of rewards earned up to that point takes its value in a specified stage-wise interval. It is shown that this class reduces to the case of the nonnegative-valued multiplicative criterion through an invariant imbedding technique. We derive a recursive formula for the optimal value function and an effective method for obtaining the optimal policies.

UR - http://www.scopus.com/inward/record.url?scp=80052842288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052842288&partnerID=8YFLogxK

U2 - 10.1016/j.jmaa.2011.08.006

DO - 10.1016/j.jmaa.2011.08.006

M3 - Article

AN - SCOPUS:80052842288

VL - 386

SP - 461

EP - 472

JO - Journal of Mathematical Analysis and Applications

JF - Journal of Mathematical Analysis and Applications

SN - 0022-247X

IS - 1

ER -