论文标题

梦想:通过优势基线和无模型学习的深刻遗憾最小化

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

论文作者

Steinberger, Eric, Lerer, Adam, Brown, Noam

论文摘要

我们介绍了Dream,这是一种深厚的强化学习算法,在与多个代理商的不完美信息游戏中找到了最佳策略。正式地,Dream会在两人零和游戏中融合NASH平衡,并在所有其他游戏中都具有广泛的粗糙相关平衡。我们的主要创新是一种有效的算法,与其他基于遗憾的深度学习算法相比,它不需要访问游戏的完美模拟器来实现良好的性能。我们表明,梦想从经验上实现了流行基准游戏中无模型算法的最先进性能,甚至与确实使用完美模拟器的算法竞争。

We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREAM empirically achieves state-of-the-art performance among model-free algorithms in popular benchmark games, and is even competitive with algorithms that do use a perfect simulator.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源