TY - GEN
T1 - Multiagent planning with trembling-hand perfect equilibrium in multiagent POMDPs
AU - Yabu, Yuichi
AU - Yokoo, Makoto
AU - Iwasaki, Atsushi
PY - 2009
Y1 - 2009
N2 - Multiagent Partially Observable Markov Decision Processes are a popular model of multiagent systems with uncertainty. Since the computational cost for finding an optimal joint policy is prohibitive, a Joint Equilibrium-based Search for Policies with Nash Equilibrium (JESP-NE) is proposed that finds a locally optimal joint policy in which each policy is a best response to other policies; i.e., the joint policy is a Nash equilibrium. One limitation of JESP-NE is that the quality of the obtained joint policy depends on the predefined default policy. More specifically, when finding a best response, if some observation have zero probabilities, JESP-NE uses this default policy. If the default policy is quite bad, JESP-NE tends to converge to a sub-optimal joint policy. In this paper, we propose a method that finds a locally optimal joint policy based on a concept called Trembling-hand Perfect Equilibrium (TPE). In finding a TPE, we assume that an agent might make a mistake in selecting its action with small probability. Thus, an observation with zero probability in JESP-NE will have non-zero probability. We no longer use the default policy. As a result, JESP-TPE can converge to a better joint policy than the JESP-NE, which we confirm this fact by experimental evaluations.
AB - Multiagent Partially Observable Markov Decision Processes are a popular model of multiagent systems with uncertainty. Since the computational cost for finding an optimal joint policy is prohibitive, a Joint Equilibrium-based Search for Policies with Nash Equilibrium (JESP-NE) is proposed that finds a locally optimal joint policy in which each policy is a best response to other policies; i.e., the joint policy is a Nash equilibrium. One limitation of JESP-NE is that the quality of the obtained joint policy depends on the predefined default policy. More specifically, when finding a best response, if some observation have zero probabilities, JESP-NE uses this default policy. If the default policy is quite bad, JESP-NE tends to converge to a sub-optimal joint policy. In this paper, we propose a method that finds a locally optimal joint policy based on a concept called Trembling-hand Perfect Equilibrium (TPE). In finding a TPE, we assume that an agent might make a mistake in selecting its action with small probability. Thus, an observation with zero probability in JESP-NE will have non-zero probability. We no longer use the default policy. As a result, JESP-TPE can converge to a better joint policy than the JESP-NE, which we confirm this fact by experimental evaluations.
UR - http://www.scopus.com/inward/record.url?scp=67650296359&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650296359&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-01639-4_2
DO - 10.1007/978-3-642-01639-4_2
M3 - Conference contribution
AN - SCOPUS:67650296359
SN - 3642016383
SN - 9783642016387
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 13
EP - 24
BT - Agent Computing and Multi-Agent Systems - 10th Pacific Rim International Conference on Multi-Agents, PRIMA 2007, Revised Papers
T2 - 10th Pacific Rim International Conference on Multi-Agents, PRIMA 2007
Y2 - 21 November 2007 through 23 November 2007
ER -