在非平稳离散时间线性季度平均场游戏中的强化学习

论文标题

在非平稳离散时间线性季度平均场游戏中的强化学习

Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

论文作者

Zaman, Muhammad Aneeq uz, Zhang, Kaiqing, Miehling, Erik, Başar, Tamer

论文摘要

在本文中，我们在离散时间线性季度平均场游戏（LQ-MFGS）的背景下研究大量人群多代理增强学习（RL）。我们的设置与MFG的RL上的大多数现有工作不同，因为我们认为在无限的地平线上考虑了非平稳的MFG。我们提出了一种迭代的参与者评论算法，以计算LQ-MFG的平均场平衡（MFE）。有两个主要的挑战：i）MFG的非平稳性引起线性季度跟踪问题，该问题需要求解无法通过标准（因果）RL算法来解决的向后（非c-ausal）方程； ii）许多RL算法假设状态是从马尔可夫链（MC）的固定分布中取样的，即，该链已经混合了，对于实际数据源不满足的假设。我们首先确定平均场轨迹遵循线性动力学，从而使该问题可以重新构成线性二次高斯问题。在此重新制定下，我们提出了一种参与者批评算法，该算法允许从无混合的MC中获取样品。然后提供该算法的有限样本收敛保证。为了表征我们在多代理RL中算法的性能，我们开发了与有限播种游戏的NASH均衡结合的错误。

In this paper, we study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the MFG induces a linear-quadratic tracking problem, which requires solving a backwards-in-time (non-causal) equation that cannot be solved by standard (causal) RL algorithms; ii) Many RL algorithms assume that the states are sampled from the stationary distribution of a Markov chain (MC), that is, the chain is already mixed, an assumption that is not satisfied for real data sources. We first identify that the mean-field trajectory follows linear dynamics, allowing the problem to be reformulated as a linear quadratic Gaussian problem. Under this reformulation, we propose an actor-critic algorithm that allows samples to be drawn from an unmixed MC. Finite-sample convergence guarantees for the algorithm are then provided. To characterize the performance of our algorithm in multi-agent RL, we have developed an error bound with respect to the Nash equilibrium of the finite-population game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题