TY - GEN
T1 - Introducing communication in Dis-POMDPs with Finite State Machines
AU - Iwanari, Yuki
AU - Tasaki, Makoto
AU - Yokoo, Makoto
AU - Iwasaki, Atsushi
AU - Sakurai, Yuko
PY - 2009/12/1
Y1 - 2009/12/1
N2 - Distributed Partially Observable Markov Decision Problems (Dis-POMDPs) are emerging as a popular approach for modeling sequential decision making in teams operating under uncertainty. To achieve coherent behaviors of agents, it is essential to perform appropriate run-time communication. Thus, there have been many works on the run-time communication schemes in Dis-POMDPs. Also, a Finite State Machine (FSM) is a popular representation for describing a local policy that works in a very long or infinite time horizon. In this paper, we examine a run-time communication scheme when the local policy of each agent is represented as an FSM. In this scheme, the meaning of each message is not predefined; it is given implicitly by the interaction between local policies. We propose an iterative-improvement type algorithm that searches for a joint policy where run-time communication incurs some cost. Thus, agents use run-time communication only when doing so is cost-effective. Interestingly, our algorithm can find a joint policy that obtains a better expected reward than a hand-crafted joint policy, and it requires fewer nodes in the local FSM and fewer message types. Furthermore, we experimentally show that our algorithm can obtain a joint policy that consists of sufficiently complex local FSMs within a reasonable amount of time.
AB - Distributed Partially Observable Markov Decision Problems (Dis-POMDPs) are emerging as a popular approach for modeling sequential decision making in teams operating under uncertainty. To achieve coherent behaviors of agents, it is essential to perform appropriate run-time communication. Thus, there have been many works on the run-time communication schemes in Dis-POMDPs. Also, a Finite State Machine (FSM) is a popular representation for describing a local policy that works in a very long or infinite time horizon. In this paper, we examine a run-time communication scheme when the local policy of each agent is represented as an FSM. In this scheme, the meaning of each message is not predefined; it is given implicitly by the interaction between local policies. We propose an iterative-improvement type algorithm that searches for a joint policy where run-time communication incurs some cost. Thus, agents use run-time communication only when doing so is cost-effective. Interestingly, our algorithm can find a joint policy that obtains a better expected reward than a hand-crafted joint policy, and it requires fewer nodes in the local FSM and fewer message types. Furthermore, we experimentally show that our algorithm can obtain a joint policy that consists of sufficiently complex local FSMs within a reasonable amount of time.
UR - http://www.scopus.com/inward/record.url?scp=84856839969&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856839969&partnerID=8YFLogxK
U2 - 10.1109/WI-IAT.2009.161
DO - 10.1109/WI-IAT.2009.161
M3 - Conference contribution
AN - SCOPUS:84856839969
SN - 9780769538013
T3 - Proceedings - 2009 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2009
SP - 267
EP - 270
BT - Proceedings - 2009 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2009
T2 - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2009
Y2 - 15 September 2009 through 18 September 2009
ER -