论文标题
在非平稳离散时间线性季度平均场游戏中的强化学习
Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games
论文作者
论文摘要
在本文中,我们在离散时间线性季度平均场游戏(LQ-MFGS)的背景下研究大量人群多代理增强学习(RL)。我们的设置与MFG的RL上的大多数现有工作不同,因为我们认为在无限的地平线上考虑了非平稳的MFG。我们提出了一种迭代的参与者评论算法,以计算LQ-MFG的平均场平衡(MFE)。有两个主要的挑战:i)MFG的非平稳性引起线性季度跟踪问题,该问题需要求解无法通过标准(因果)RL算法来解决的向后(非c-ausal)方程; ii)许多RL算法假设状态是从马尔可夫链(MC)的固定分布中取样的,即,该链已经混合了,对于实际数据源不满足的假设。我们首先确定平均场轨迹遵循线性动力学,从而使该问题可以重新构成线性二次高斯问题。在此重新制定下,我们提出了一种参与者批评算法,该算法允许从无混合的MC中获取样品。然后提供该算法的有限样本收敛保证。为了表征我们在多代理RL中算法的性能,我们开发了与有限播种游戏的NASH均衡结合的错误。
In this paper, we study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the MFG induces a linear-quadratic tracking problem, which requires solving a backwards-in-time (non-causal) equation that cannot be solved by standard (causal) RL algorithms; ii) Many RL algorithms assume that the states are sampled from the stationary distribution of a Markov chain (MC), that is, the chain is already mixed, an assumption that is not satisfied for real data sources. We first identify that the mean-field trajectory follows linear dynamics, allowing the problem to be reformulated as a linear quadratic Gaussian problem. Under this reformulation, we propose an actor-critic algorithm that allows samples to be drawn from an unmixed MC. Finite-sample convergence guarantees for the algorithm are then provided. To characterize the performance of our algorithm in multi-agent RL, we have developed an error bound with respect to the Nash equilibrium of the finite-population game.