通过学习的队友模型分散的MCT

论文标题

通过学习的队友模型分散的MCT

Decentralized MCTS via Learned Teammate Models

论文作者

Czechowski, Aleksander, Oliehoek, Frans A.

论文摘要

由于可扩展性和鲁棒性的提高，分散的在线规划可能是合作多机构系统的有吸引力的范式。这种方法的关键困难在于对其他代理的决策进行准确的预测。在本文中，我们提出了一种基于分散的蒙特卡洛树搜索的可训练在线分散计划算法，并结合了从以前的情节跑步中学到的队友的模型。通过在理想的策略近似的假设下，仅允许一个代理一次调整其模型，我们方法的连续迭代可以保证改善关节策略，并最终导致融合到NASH平衡。我们通过在[Claes等，2015]中引入的空间任务分配环境的几种情况下执行实验来测试算法的效率。我们表明，可以采用深度学习和卷积神经网络来生成准确的策略近似器，从而利用问题的空间特征，并且提议的算法在基线计划性能上改善了特别具有挑战性的域配置。

Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search, combined with models of teammates learned from previous episodic runs. By only allowing one agent to adapt its models at a time, under the assumption of ideal policy approximation, successive iterations of our method are guaranteed to improve joint policies, and eventually lead to convergence to a Nash equilibrium. We test the efficiency of the algorithm by performing experiments in several scenarios of the spatial task allocation environment introduced in [Claes et al., 2015]. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators which exploit the spatial features of the problem, and that the proposed algorithm improves over the baseline planning performance for particularly challenging domain configurations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题