通过集合变压器的关系推理：可证明的效率和对MARL的应用

论文标题

通过集合变压器的关系推理：可证明的效率和对MARL的应用

Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

论文作者

Zhang, Fengzhuo, Liu, Boyi, Wang, Kaixin, Tan, Vincent Y. F., Yang, Zhuoran, Wang, Zhaoran

论文摘要

具有置换不变代理框架的合作多人学习学习（MARL）在现实世界应用中取得了巨大的经验成功。不幸的是，由于许多代理商的诅咒以及对现有作品中关系推理的有限探讨，对这个MARL问题的理论理解缺乏。在本文中，我们验证了变压器是否实现了复杂的关系推理，并提出和分析基于模型和基于模型的离线MARL MARL算法与变压器近似器。我们证明，基于模型和基于模型的算法的次级次数差距分别与代理数量中的数量无关和对数，这减轻了许多试剂的诅咒。这些结果是变压器的新概括误差结合的结果以及对变压器系统动力学的最大似然估计（MLE）的新分析。我们的基于模型的算法是第一个明确利用代理的置换不变性的可证明有效的MARL算法。我们改进的概括约束可能具有独立的兴趣，并且适用于与MARL超越变压器有关的其他回归问题。

The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题