论文标题
光谱归一化在多机构增强学习中的影响
Effects of Spectral Normalization in Multi-agent Reinforcement Learning
论文作者
论文摘要
一位可靠的批评家是派演员批评学习的核心。但是,由于两个因素,学习可靠的批评者在多代理稀疏奖励情景中变得具有挑战性:1)与代理人的数量成倍增长2)与奖励的稀疏性和环境噪音相结合,这与准确学习的样本要求相结合。我们表明,将批评家正常于光谱归一化(SN)正规化,即使在多方面的稀疏奖励奖励情景中,也可以更加强大地学习。我们的实验表明,正规评论家可以迅速从复杂的SMAC和Rware域中稀疏的有益体验中学习。这些发现突出了批评家对稳定学习的正规化的重要性。
A reliable critic is central to on-policy actor-critic learning. But it becomes challenging to learn a reliable critic in a multi-agent sparse reward scenario due to two factors: 1) The joint action space grows exponentially with the number of agents 2) This, combined with the reward sparseness and environment noise, leads to large sample requirements for accurate learning. We show that regularising the critic with spectral normalization (SN) enables it to learn more robustly, even in multi-agent on-policy sparse reward scenarios. Our experiments show that the regularised critic is quickly able to learn from the sparse rewarding experience in the complex SMAC and RWARE domains. These findings highlight the importance of regularisation in the critic for stable learning.