TY - GEN

T1 - Multiagent planning with trembling-hand perfect equilibrium in multiagent POMDPs

AU - Yabu, Yuichi

AU - Yokoo, Makoto

AU - Iwasaki, Atsushi

PY - 2009

Y1 - 2009

N2 - Multiagent Partially Observable Markov Decision Processes are a popular model of multiagent systems with uncertainty. Since the computational cost for finding an optimal joint policy is prohibitive, a Joint Equilibrium-based Search for Policies with Nash Equilibrium (JESP-NE) is proposed that finds a locally optimal joint policy in which each policy is a best response to other policies; i.e., the joint policy is a Nash equilibrium. One limitation of JESP-NE is that the quality of the obtained joint policy depends on the predefined default policy. More specifically, when finding a best response, if some observation have zero probabilities, JESP-NE uses this default policy. If the default policy is quite bad, JESP-NE tends to converge to a sub-optimal joint policy. In this paper, we propose a method that finds a locally optimal joint policy based on a concept called Trembling-hand Perfect Equilibrium (TPE). In finding a TPE, we assume that an agent might make a mistake in selecting its action with small probability. Thus, an observation with zero probability in JESP-NE will have non-zero probability. We no longer use the default policy. As a result, JESP-TPE can converge to a better joint policy than the JESP-NE, which we confirm this fact by experimental evaluations.

AB - Multiagent Partially Observable Markov Decision Processes are a popular model of multiagent systems with uncertainty. Since the computational cost for finding an optimal joint policy is prohibitive, a Joint Equilibrium-based Search for Policies with Nash Equilibrium (JESP-NE) is proposed that finds a locally optimal joint policy in which each policy is a best response to other policies; i.e., the joint policy is a Nash equilibrium. One limitation of JESP-NE is that the quality of the obtained joint policy depends on the predefined default policy. More specifically, when finding a best response, if some observation have zero probabilities, JESP-NE uses this default policy. If the default policy is quite bad, JESP-NE tends to converge to a sub-optimal joint policy. In this paper, we propose a method that finds a locally optimal joint policy based on a concept called Trembling-hand Perfect Equilibrium (TPE). In finding a TPE, we assume that an agent might make a mistake in selecting its action with small probability. Thus, an observation with zero probability in JESP-NE will have non-zero probability. We no longer use the default policy. As a result, JESP-TPE can converge to a better joint policy than the JESP-NE, which we confirm this fact by experimental evaluations.

UR - http://www.scopus.com/inward/record.url?scp=67650296359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650296359&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-01639-4_2

DO - 10.1007/978-3-642-01639-4_2

M3 - Conference contribution

AN - SCOPUS:67650296359

SN - 3642016383

SN - 9783642016387

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 13

EP - 24

BT - Agent Computing and Multi-Agent Systems - 10th Pacific Rim International Conference on Multi-Agents, PRIMA 2007, Revised Papers

T2 - 10th Pacific Rim International Conference on Multi-Agents, PRIMA 2007

Y2 - 21 November 2007 through 23 November 2007

ER -