合作多代理增强学习的认证政策平滑

论文标题

合作多代理增强学习的认证政策平滑

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

论文作者

Mu, Ronghui, Ruan, Wenjie, Marcolino, Leandro Soriano, Jin, Gaojie, Ni, Qiang

论文摘要

合作多代理增强学习（C-MARL）广泛应用于安全 - 关键的情景中，因此C-MARL模型的鲁棒性分析非常重要。但是，在社区中尚未探讨针对C-MARLS的鲁棒性认证。在本文中，我们提出了一种新颖的认证方法，这是第一个利用C-MARL的可扩展方法来确定具有保证认证界限的行动的工作。与单一机构系统相比，C-MARL认证提出了两个关键挑战：（i）随着代理数量的增加而累积的不确定性；（ii）将单个代理商的行动转变为全球团队奖励时潜在的影响。这些挑战阻止我们直接使用现有算法。因此，考虑到每个代理人对每个州的鲁棒性的重要性，我们采用了错误的发现率（FDR）控制程序，并提出了基于树搜索的算法，以在最小的认证扰动下找到全球奖励的下限。由于我们的方法是一般的，因此它也可以应用于单格环境中。我们从经验上表明，我们的认证范围比最先进的RL认证解决方案更高。我们还对两种流行的C-MARL算法进行实验：QMIX和VDN，在两个不同的环境中，有两个和四个代理。实验结果表明，我们的方法为所有模型和环境产生有意义的保证鲁棒性。我们的工具certifycmarl可从https://github.com/trustai/certifycma获得

Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA

下载PDF全文

下载文献需遵守相关版权规定

论文标题