零和多代理游戏中的异步梯度玩

论文标题

零和多代理游戏中的异步梯度玩

Asynchronous Gradient Play in Zero-Sum Multi-agent Games

论文作者

Ao, Ruicheng, Cen, Shicong, Chi, Yuejie

论文摘要

近年来，通过竞争性多代理游戏在竞争性多代理游戏中找到平衡，引起了人们的关注，重点是设计有效的策略，在这些策略中，代理商以分散和对称的方式运作并保证了融合。尽管在了解零和两人矩阵游戏方面已做出了重大努力，但零和多代理游戏的性能仍然不足，尤其是在有延迟的反馈面前，使梯度的可伸缩性和弹性对问题开放。在本文中，我们通过在延迟反馈下研究零和polymatrix游戏中的异步梯度作用来取得进展。我们首先确定熵调节的乐观乘量重量更新（OMWU）方法在没有延迟的情况下，在有限合理性下的解决方案概念线性收敛到量子响应平衡（QRE）。即使在温和的统计假设下反馈随机延迟，线性收敛仍继续保持，但由于学习率较小，由于较小的可耐受性范围，它以明显较慢的速度收敛。我们超越了，我们证明了熵登记的OMWU - 通过以延迟感知的方式采用两个时间的学习率 - 在固定的延迟下享受更快的最后近期收敛速度，并且即使延迟以平均持久方式任意限制，也可以继续融合。我们的方法还导致有限的时间保证通过调节正则化量来近似NASH平衡（NE）。据我们所知，这项工作是第一个旨在了解在零和多头发游戏中的异步梯度游戏，在广泛的延迟假设下，强调了学习率分开的作用。

Finding equilibria via gradient play in competitive multi-agent games has been attracting a growing amount of attention in recent years, with emphasis on designing efficient strategies where the agents operate in a decentralized and symmetric manner with guaranteed convergence. While significant efforts have been made in understanding zero-sum two-player matrix games, the performance in zero-sum multi-agent games remains inadequately explored, especially in the presence of delayed feedbacks, leaving the scalability and resiliency of gradient play open to questions. In this paper, we make progress by studying asynchronous gradient plays in zero-sum polymatrix games under delayed feedbacks. We first establish that the last iterate of entropy-regularized optimistic multiplicative weight updates (OMWU) method converges linearly to the quantal response equilibrium (QRE), the solution concept under bounded rationality, in the absence of delays. While the linear convergence continues to hold even when the feedbacks are randomly delayed under mild statistical assumptions, it converges at a noticeably slower rate due to a smaller tolerable range of learning rates. Moving beyond, we demonstrate entropy-regularized OMWU -- by adopting two-timescale learning rates in a delay-aware manner -- enjoys faster last-iterate convergence under fixed delays, and continues to converge provably even when the delays are arbitrarily bounded in an average-iterate manner. Our methods also lead to finite-time guarantees to approximate the Nash equilibrium (NE) by moderating the amount of regularization. To the best of our knowledge, this work is the first that aims to understand asynchronous gradient play in zero-sum polymatrix games under a wide range of delay assumptions, highlighting the role of learning rates separation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题