Introducing communication to joint policy search algorithm for networked distributed POMDPs

Makoto Tasaki, Yuichi Yabu, Makoto Yokoo, Pradeep Varakantham, Janusz Marecki, Mihnd Tambe

Research output: Contribution to journalArticle

Abstract

Multiagent partially Observable Markov Decision Process (Multiagent POMDP) is a popular approach for modeling multi-agent systems acting in uncertain domains. An existing approach (Search for Policies In Distributed Envi Ronments, SPIDER) guarantees to obtain an optimal joint plan by exploiting agent interaction structure. Using SPIDER, we can obtain an optimal joint policy for large-scale problems if the interaction among agents is sparse. However, the size of a local policy is still too large to obtain a policy which length is more than 4. To overcome this problem, we extends the SPIDER so that agents can communicate their observation history and action history each other. After communication, agents can start from a new synchronized belief state thus the combinatorial explosion of local policies is avoided. Our experimental results show that we can obtain much longer policies as long as the interval between communications is small.

Original languageEnglish
Pages (from-to)226-237
Number of pages12
JournalComputer Software
Volume25
Issue number4
Publication statusPublished - 2008

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Tasaki, M., Yabu, Y., Yokoo, M., Varakantham, P., Marecki, J., & Tambe, M. (2008). Introducing communication to joint policy search algorithm for networked distributed POMDPs. Computer Software, 25(4), 226-237.