国家对抗性多代理增强学习的解决方案是什么？

论文标题

国家对抗性多代理增强学习的解决方案是什么？

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

论文作者

Han, Songyang, Su, Sanbao, He, Sihong, Han, Shuo, Yang, Haizhao, Zou, Shaofeng, Miao, Fei

论文摘要

已经开发了多种代理增强学习（MARL）的各种方法，假设代理的政策基于准确的状态信息。但是，通过深度加强学习（DRL）学习的政策容易受到对抗状态扰动攻击的影响。在这项工作中，我们提出了一个州对抗马尔可夫游戏（SAMG），并首次尝试调查国家不确定性下的MARL的不同解决方案概念。我们的分析表明，SAMG中并不总是存在常用的最佳代理策略和鲁棒NASH平衡的解决方案概念。为了避免这种困难，我们考虑了一个新的解决方案概念，称为强大的代理策略，代理旨在最大化最坏情况的预期状态价值。我们证明了有限状态和有限行动SAMGS的强大代理政策的存在。此外，我们提出了一种强大的多代理对抗性参与者批评（RMA3C）算法，以了解国家不确定性下的MARL代理的强大策略。我们的实验表明，当面对状态扰动并大大提高了MARL政策的鲁棒性时，我们的算法优于现有方法。我们的代码在https://songyanghan.github.io/what_is_solution/上公开。

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题