为了在对称团队中学习，当地的Optima是全球NASH Equilibria

论文标题

为了在对称团队中学习，当地的Optima是全球NASH Equilibria

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

论文作者

Emmons, Scott, Oesterheld, Caspar, Critch, Andrew, Conitzer, Vincent, Russell, Stuart

论文摘要

尽管自1970年代以来就已经知道普通付款游戏中的全球最佳策略概况是NASH均衡，但全球最优性是严格的要求，它限制了结果的适用性。在这项工作中，我们表明任何本地最佳的对称策略概况也是（全局）NASH平衡。此外，我们表明，这一结果对对共同的回报和本地最佳最佳驱动是强大的。应用于机器学习，我们的结果为任何梯度方法提供了全球保证，该方法在对称策略空间中找到了局部最佳。尽管该结果表明单方面偏差的稳定性，但我们仍然确定了广泛的游戏类别，这些游戏混合了当地的最佳选择，在不对称的偏差下是不稳定的。我们通过在一系列对称游戏中运行学习算法来分析不稳定性的普遍性，并通过讨论结果对多代理RL，合作逆RL和分散的POMDP的适用性来得出结论。

Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题